Multiscale Methods for Shape Constraints in Deconvolution: 
Confidence Statements for Qualitative Features 

^ J. Schmidt-Hieber*! A. Munkf and L. Diimbgen^ 

o 

(N 

^j Abstract 

^^ We derive multiscale statistics for deconvolution in order to detect qualitative fea- 

,_i tures of the unknown density. An important example covered within this framework is 

r^ to test for local monotonicity on all scales simultaneously. We investigate the moder- 

ately ill-posed setting, where the Fourier transform of the error density in the deconvo- 
lution model is of polynomial decay. For multiscale testing, we consider a calibration, 
•^ motivated by the modulus of continuity of Brownian motion. We investigate the per- 

C formance of our results from both the theoretical and simulation based point of view. A 

major consequence of our work is that the detection of qualitative features of a density 
^^ in a deconvolution problem is a doable task although the minimax rates for pointwise 

■^^ estimation are very slow. 

o 

^ AMS 2010 Subject Classification: Primary 62G10; secondary 62G15, 62G20. 

o 

1-H Keywords: Brownian motion; convexity; pseudo-differential operators; ill-posed prob- 

^ lems; mode detection; monotonicity; multiscale statistics; shape constraints. 

X 



in 



'Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, 
Netherlands. 

^For correspondence sclimidth@matli.uni-goettingen.de 

*Institut fiir Mathematische Stochastik, Universitat Gottingen, Goldschmidtstr. 7, 37077 Gottingen and 
Max-Planck Institute for Biophysical Chemistry, Am Fassberg 11, D-37077 Gottingen, Germany. 

nnstitut fiir mathematische Stochastik und Versicherungslehre, Universitat Bern, Alpeneggstrasse 22, 
CH-3012 Bern, Switzerland. 



1 Introduction and Notation 

Assume that we observe Y = (Yi, . . . , Yn) according to the deconvolution model 

Yi = Xi + ei, i = l,...,n, (1.1) 

where Xi,€i, i = 1, . . . ,n are assumed to be real valued and independent, Xi ~ ' X,ei ~ ' e 
and Yi,X,€ have densities g,f and f^, respectively. Our goal is to develop multiscale test 
statistics for certain structural assumptions on /, where the density /^ of the blurring 
distribution is assumed to be known. 

Structural assumptions or shape constraints are conveniently expressed in this paper as 
(pseudo)-difFerential inequalities of the density /, assuming for the moment that / is suffi- 
ciently smooth. Important examples are /' ^ to check local monotonicity properties as 
well as /" ^ for local convexity or concavity. To give another example, suppose that we 
are interested in local monotonicity properties of the density / of exp(aX) for a given a > 0. 
Since /(s) = (as)^^/(a^^ log(s)), one can easily verify that local monotonicity properties 
of / may be expressed in terms of the inequalities /' — a/ ^ 0. 

Hypothesis testing for deconvolution and related inverse problems is a relatively new area. 
Current methods cover testing of parametric assumptions (cf. [Il[33ll6]) and, more recently, 
testing for certain smoothness classes such as Sobolev balls in a Gaussian sequence model 
(Laurent et al. [32l ES] and Ingster et al. [22] )• All these papers focused on regression 
deconvolution models. Exceptions for density deconvolution are Holzmann et al. [23], Bal- 
abdaoui et al. [3J, and Meister [36] who developed tests for various global hypotheses, such 
as global monotonicity based on classical Fourier inversion (see e.g. Carroll and Hall [3). 
The latter test has been derived for one fixed interval and allows to check whether a density 
is monotone on that interval at a preassigned level of significance. 

Throughout this work let T{f) = f^ex.p{—ix-)f{x)dx denote the Fourier transform of 
f ^ L} (M) or / G L^ (M) (depending on the context). As shape constraints, we consider a 
general class of differential operators op(p) with symbol p, which can be written for nice / 
as 

(op(p)/)(x) = l-j e'-^p{x,i)F{m)di. (1.2) 

This class will be an enlargement of (elliptic) pseudo-differential operators by fractional 



differentiation. Given data from model (1.1) the goal is to identify intervals at a controlled 
error level on which Re(op(p)/) ^ or Re(op(p)/) ^ 0, where Re denotes the projection 
on the real part. If applied to op(p) = D oi D^ (i.e. p{x,^) = i^ and p{x,£) = — ^^, 
respectively) with the differentiation operator Df := f , our method yields bounds for the 



number and confidence regions for the location of modes and inflection points of /. More- 
over, we discuss an example related to Wiksell's problem with shape constraint described 
by fractional differentiation. Our work can be viewed as an extension of Chaudhuri and 
Marron [8] as well as Diimbgen and Walther [13] who treated the case op(p) = D"^ (with 
m = 1 in fT3]) in the direct case, i.e. when e = 0. However, the approach in [8] does not 
allow for sequences of bandwidths tending to zero and yields limit distributions depending 
on the unknown quantities again. The methods in [TB] require a deterministic coupling 
result. This allows to consider the multiscale approximation for / = Ifq ^i only but cannot 
be transfered to deconvolution. Thus, a new theoretical framework as well as completely 
different proving strategies have to be developed. 

The statistic introduced in this paper investigates shape constraints of the unknown density 
/ on all scales simultaneously. Although qualitative hypotheses such as local monotonicity 
seem, at a first glance, not to be expressible in terms of the Fourier transform, we can make 
use of the following trick: Define St^hi') = {' ~ t)/h and for a sufficiently smooth, positive 
kernel (f) supported on [0, 1], consider the test statistic Tt^h '■= n"^'"^ Ylk=i -^^ ^t,/i(^A:) with 

1 f e^^^ 

and op(p)* is the adjoint of op(p) (in a certain space) with respect to the L^-inner product 
(/ii,/i2) := j^hi{x)h2{x) dx . Then, in expectation, using Parseval's identity, 

¥.Tt,h = ^ Re j T{o^{pf{<t>oSt,h)){s)Hf){s)ds 

Re j {ov{pY{(t^o St,h)){x)f{x)dx = y/n {<l)o St^h.'Reovip)!) (1.3) 



n 

for sufficiently regular functions / and (j). As an example, consider op(p) = D. Then, the 
functions (poSf^h can serve as localized test functions for local monotonicity in the following 
sense: Whenever we know that {(p o St^h, f) > 0, we may conclude that /(si) < /(s2) for 
some points si < S2 in [t, t + h]. This gives rise to a multiscale statistic 

f\Tt^h-^Tt,h\ ^\ 

Tn = sup Wh zz===^ -Wh\ , 

(t,h) V std(rt,;,) / 

where Wh and Wh are chosen in order to calibrate the different scales with equal weight, 
while Std(Tt^ft) is an estimator of the standard deviation of T^ /j. 

The key result in this paper is the approximation of T„ by a distribution-free statistic 
that allows us to compute critical values. As mentioned before, our multiscale calibration 
requires new techniques in order to determine the speed of convergence between Tn and 
its approximation. The main tool will be a strong approximation based on Hungarian 



construction. This allows us on the one hand to extend the approach of [13], resulting for 
example in simultaneous confidence statements for the existence and location of regions of 
increase and decrease. On the other hand, our approach is statistically more informative 
than pure testing. In fact, for given shape constraint, we construct objects which appear 
to be similar to superpositions of confidence bands. These will be denoted as confidence 
rectangles and allow us to identify regions where the shape constraints expressed in terms of 
differential inequalities, as mentioned at the beginning of this section, hold with prescribed 
probability. The strength of this approach lies in the fact that in contrast to sup norm bands 
all scales can be used simultaneously and the control of the bias becomes dispensable. For 
a more precise statement see Section [3) 

It is a well-known fact (cf. Delaigle and Gijbels [llj ) that selection of an appropriate band- 
width is a delicate issue in deconvolution models. One of the main advantages of multiscale 
methods is that essentially no smoothing parameter is required. The main choice will be 
the quantile of the multiscale statistic, which has a clear probabilistic interpretation. Fur- 
thermore, our multiscale statistic allows to construct estimators for the number of modes 
and inflection points which have a number of nice properties: On one hand, modes and 
inflection points are detected with the minimax rate of convergence (up to a log-factor). 
On the other hand, the probability that the true number is overestimated is very low, and 
completely controlled by the quantile of the multiscale statistic. To state it differently, it 
is highly unlikely that artefacts will be included in the reconstruction, which is a desirable 
property in many applications. It is worth to note that neither assumptions are made on 
the number of modes nor additional model selection penalties are necessary. 

This paper deals with the moderately ill-posed case, meaning that the Fourier transform of 
the blurring distribution decays at polynomial rate. In fact, we work under the well-known 
assumption of Fan [16] (cf. Assumption ^ , which essentially assures that the inversion op- 
erator, mapping g >-^ f, is pseudo-differential. This nicely combines with the assumption on 
the class of shape constraints. Our framework includes many important error distributions 
such as Exponential, x^j Laplace and Gamma distributed random variables. The special 
case e = (i.e. no deconvolution or direct problem) can be treated as well, of course. 

For practical applications, we may use these models if for instance the error variable e is an 
independent waiting time. For example let Xj be the (unknown) time of infection of the i-th 
patient, e, the corresponding incubation time, and Yi is the time when diagnosis is made. 
Then, it is convenient to assume e ~ T{r,9) (see for instance [1U| . Section 3.5). By the 
techniques developed in this paper one will be able to identify for example time intervals 
where the number of infections increased and decreased for a specified confidence level. 
Another application is single photon emission computed tomography (SPECT), where the 



detected scattered photons are blurred by Laplace distributed random variables (cf. Floyd 
et al. p[7], Kacperski et al. [28]). 

The paper is organized as follows. In Section [2] we show how distribution-free approxima- 
tions of multiscale statistics can be derived for general empirical processes under relatively 
weak conditions. For the precise statement see Theorem [T] These results are transfered 
to shape constraints and deconvolution models in Section [3| In Section [4] we discuss the 
statistical consequences and show how confidence statements can be derived. Theoretical 
questions related to the performance of the multiscale method and numerical aspects are 
discussed in Sections [5] and [6} In particular, for a number of cases, we are even able to 
identify the asymptotically optimal kernel function (/; as a beta kernel, where the degree 
increases with the ill-posedness of the problem. Proofs and further technicalities are shifted 
to the appendix and a supplementary part, which contains additionally various lemmas, 
enumerated by B.l, B.2, ..., C.l, C.2, . . . 

Notation: We write T for the set [0, 1] x (0, 1]. < and > means larger (smaller) or equal up 
to a constant and [x\ is the largest integer which is not larger than x. supp (p denotes the 
support of (j). In the following, N is the set of non-negative integers. (•, •) denotes the L^- 
inner product and || • ||p, the L/ norm on M. Furthermore, set TV(-) for the total variation of 
functions on R. As custom in the theory of Sobolev spaces, we define (s) := (l + |sp)^'^. If it 
is clear from the context, we write x (j) to denote the function x i— )• x (f){x) and similar (x) <p 
for the function x i— )• {x) (f){x). The Sobolev space H^ is defined as the class of functions 
with norm 

mHr:=[j{s)^''\H<i^){s)\''dsf'\<^. 
For any q and ^ G N, define H"^ as the following Sobolev type space 

Hi := {tP I xV e^", for k = 0,1, . . . ,£ }. 
The norm on H^ is given by ||^||j:^'j := l^fc=o ll^'^^ll^' fo'^ V' ^ Hf. 

2 A general multiscale test statistic 

In this section, we shall give a fairly general convergence result which is of interest on its 



own. The presented result does not use the deconvolution structure of model (1.1). It only 
requires that we have observations Yi = G~^{Ui), i = 1, . . . ,n with Ui i.i.d. uniform on 
[0, 1] and G an unknown distribution function with Lebesgue density g in the class 

Q := Gc,c,g '■= {G I G is a distribution function with density g, 

c < S-lp,!]' Il^lloo < c"\ and g G J{C,q) } (2.1) 



for fixed c,C > 0, < q < 1/2, and the Lipschitz type constraint 

J:=J{C,q):={h \ \^/h{^ - ^/h^\ < C{1 + \x\ + \y\y\x - y\, forallx,yGR}. 

For a set of real-valued functions {'4't,h)t,h define the test statistic (empirical process) Tt^h = 
n-^'^I2=ii't,h{yk)- Note that Std(Ti,/,) = {!i^ly,{s)g{s)dsY/^ « Ut^hhVoi^ if i^t,h is 
locaUzed around t. It will turn out later on that one should allow for a slightly regularized 
standardization and therefore we consider 

\Tt,h-nTt,h]\ 

Vt,h V9n{t) 
with Vt^h ^ ||'0t,?t||2 and Qn an estimator of g, satisfying 

sup ||5„ -5r||oo = Op(l/logn). (2.2) 

Gee 

Unless stated otherwise, asymptotic statements refer to n — )• oo. We combine the single 
test statistics for an arbitrary subset 

BnC{{t,h)\t€ [0, 1] , /i G [/„, Un] } (2.3) 

and consider for ly > e and 



log log I 



distribution-free approximations of the multiscale statistic 

'\Tt,h-nTt,h] 



T^:= sup w,[ ^ ; A^- L-J21og^ . (2.5) 

Assumption 1 (Assumption on test functions). Given functions {ipt,h)(t.h)eTi numbers 



{^t,h){t,h)eT-: '^^^ ^ set Bn of the form (2.3), suppose that the following assumptions hold. 

(i) Forall{t,h)eT, Ut,hh<Vt,h- 
(a) We have uniform bounds on the norms 



^TY{^Pt,h) + VhUt,h\\oo + h-'/^^Pt,h\\i ^ ^ 
(Hi) There exists a > 1/2, such that 



sup 

it,h)eT ^t,h 



Tv(v^,,,(-)[vOT-v^(r) 



Kn := sup Wh ^7 > 0. 

{t,h)£Bn, Geg ^t,h 



(iv) There exists a constant K, such that for all (t, h), (t', h') G T, 
^Ay/h' 



'H,h - i^t',h'h + \yt,h - yt',h'\ < K^\t-t'\ + \h-h'\. 



Vt,h V Vfy 

Theorem 1. Given a multiscale statistic of the form (2.5). Work in model ( |1.1| ) under 
Assumptionlll and suppose that on T the process {t,h) i— ;■ \/hV^'^ j i^t,h{s)dWs has con- 
tinuous sample paths. Assume that /„nlog~ n — )• oo and Un = o(l). Then, there exists a 
(two-sided) standard Brownian motion W , such that for v > e, 



sup 



Tn - sup Wh 
it,h)eB„ 



Ji^t,h{s)dWs 



Vt 



t,h 



2iogi: 



Op{r„ 



(2.6) 



with 



rn = sup \\gn - g\ 



G&g 



logn 
' log log n 



+ C'/'n-V2 



log logn loglog(l/n„) 



+ Kr, 



Moreover, 



sup Wh 

{t,h)eT 



Jijt,h{s)dWs 
Vt,h 



21og ^ < oo, a.s. 



(2.7) 



Hence, the approximating statistic in (2.6) is almost surely bounded from above by (2.7). 



The proof of the coupling in this theorem (cf. Appendix Kl) is based on generalizing tech- 
niques developed by Gine et al. [IS], while finiteness of the approximating test statistic 
utilizes results of Diimbgen and Spokoiny [12j. Note that Theorem [T] can be understood as 
a multiscale analog of the L°°-loss convergence for kernel estimators (cf. [jT^l [THl El l2Uj). 



To give an example, let us assume that ipt^h = ''/'(^) is a kernel function. By Lemmas 
and 



C.5 



C.2 



Assumption hi holds for Vth = \\4>t,h\\2 = V^UV'lb, whenever -0 / on a Lebesgue 
measurable set, TV(V') < oo and suppV' C [0,1]. Furthermore, by partial integration, we 
can easily verify that the process (t, h) i— )■ ||'0||2^ / '4't,h{s)dWs has continuous sample paths 
(cf. [12], p. 144). 

Remark 1. As a side remark let us mention that it is also possible to choose B„ in order 
to construct (level-dependent) values for simultaneous wavelet thresholding. To this end 
observe that dj^k = T'fc2-j,2-i and dj^k = ^Tk2-3 ^-j = ! '4^k2-i,2-3{s)9{s)ds = /V'(2-'s - 
k)g{s)ds are the (estimated) wavelet coefficients and if jon and j in are integers satisfying 
2~^^"nlog~ n — )• oo and jon — ^ oo, then, for a E (0, 1), and 



Bn = {{k2-^,2-^)\ A: = 0,1,, 



.2^ 



1, jOn <j <jln, j G N }, 



Theorem\n yields in a natural way lev el- dependent thresholds qj^k{ct), such that 

lim F( \dj^k — dj,k\ < qj^k{oi), for all j,k, with {k2~^ ,2~-') € i?„ I = 1 — a. 



n— >oo 



Let us close this section with a result on the lower bound of the approximating statistic. 

Theorem [T] shows that the approximating statistic is almost surely bounded from above. 
Note that we have the trivial lower bound 

log 7^ 
r„ > - inf ^ '* 



(t,h)eB„ log log ^ 

which converges to — oo in general and describes the behavior of T„, provided the cardinality 
of Bn is small (for instance if B^ contains only one element). However, if i?„ is sufficiently 
rich, Tn can be shown to be bounded from below, uniformly in n. Let us make this more 
precise. Assume, that for every n there exists a Kn such that Kn — )• oo and 

BK^--={{it^^J \^ = 0,...,K^-l}cBr.. (2.8) 

Then, the approximating statistic is asymptotically bounded from below by —1/4. This 



follows from Lemma C.l in the appendix. It is a challenging problem to calculate the 
distribution for general index set B^ explicitly. Although the tail behavior has been studied 
for the one-scale case (cf . |18| [5] ) this has not been addressed so far for the approximating 
statistic in Theorem [T] For implementation, later on, our method relies therefore on Monte 
Carlo simulations. 



3 Testing for shape constraints in deconvolution 



We start by defining the class of differential operators in (1.2). However, before we make 
this precise, let us define pseudo-differential operators in dimension one as well as fractional 
integration and differentiation. Given a real m, consider S"™" the space of functions a : 
M X M ^ C such that for ah a, /3 G N, 

\d^d^a{x,C)\ < Ca^isil + lel)"""" for ah x,C£R. (3.1) 

Then the pseudo-differential operator Op(a) corresponding to the symbol a can be defined 
on the Schwartz space of rapidly decreasing functions S by 

Op(a) :S ^S 

Op(a)<A(x) := ^ fe^^^a{x,Onm)d^- 



It is well-known that for any s € M, Op(a) can be extended to a continuous operator 
Op(a) : 7/™+* — )• H^. In order to simplify the readability, we only write Op for pseudo- 



differential operators and op in general for operators of the form (1.2). Throughout the 
paper, we write i" = exp(a7risign(s)/2) and understand as usual (ibis)" = |s|"z-^°. The 
Gamma function evaluated at a will be denoted by r(a). Let us further introduce the 
Riemann-Liouville fractional integration operators on the real axis and for a > 0, by 

(Ilh){x) := — ^ r , ^^*l dt and (r/i)(x) := -^ H , ^^\ dt. (3.2) 

For /3 > 0, we define the corresponding fractional differentiation operators {D^h){x) := 
D'^{ll~^h){x) and {D'^h){x) = (-D)" (/!'"''/) (x), where n= [f3\+ 1. For any s G M, we 
can extend D^ and D_ to continuous operators from H"^^ — t- H^ using the identity (cf. 
t29j,p.90), 

T{Dih) (0 = {±iO^Hh) (0 = 4^\^fHh) (0- (3.3) 

In this paper, we consider operators op{p) which "factorize" into a pseudo-differential oper- 
ator and a fractional differentiation in Riemann-Liouville sense. More precisely, the symbol 
p is in the class 

5™ := { (x, ^ Pix, = a(x, OlCr''^ \a£S^, m = m + j, 7 e {0} U [1, oo), /x G M }. 

Let us mention that we cannot allow for all 7 > since in our proofs it is essential that 
d'ip{x, £,) is integrable. The results can also be formulated for finite sums of symbols, i.e. 
J2j=iPj ^11*^ Pj ^ 5^™- However, for simplicity we restrict us to J = 1. 

Throughout the remaining part of the paper, we will always assume that op{p)f is continu- 
ous. A closed and axes-parallel rectangle inM^ with vertices (oi, 61), (oi, 62)) {0-2, ^1), (o2) ^2), 
ai < a2, bi < 62 will be denoted by [01,02] x [61,62]- 

The main objective of this paper is to obtain uniform confidence statement of the following 
kinds: 

(i) The number and location of the roots and maxima of op{p)f. 

(ii) Simultaneous identification of intervals of the form [ti,ti + hi], ti G [0,1], /ij > 0, z 
in some index set /, with the following property: For a pre-specified confidence level 
we can conclude that for alH G / the functions (op(p)/)|[(. j._|_/jj attain, at least on a 
subset of [ij, tj + /ij], positive values. 

{ii') Same as {ii), but we want to conclude that {op{p) f)\[ti^ti+hi\ has to attain negative 
values. 



in) For any pair (t, h) € Bn with Bn as in (2.3), we want to find b^(t, h, a) and 6+(t, h, a), 



such that we can conclude that with overaU confidence 1 — a, the graph of op(p)/ 
(denoted as graph(op(p)/) in the sequel) has a non-empty intersection with every 
rectangle [t,t + h] x [b^{t,h,a),b^{t,h,a)]. 

In the following we will refer to these goals as Problems (i), (ii), {ii') and {Hi), respec- 
tively. Note that {ii) follows from {Hi) by taking all intervals [t,t + h] with 6_(i, h,a) > 0. 
Analogously, [t,t + h] satisfies {ii') whenever b^{t,h,a) < 0. The geometrical ordering of 
the intervals obtained by {ii) and {ii') yields in a straightforward way a lower bound for 
the number of roots of op(p)/, solving Problem (i) (cf. also Diimbgen and Walther [l3]). A 
confidence interval for the location of a root can be constructed as follows: If there exists 
[t, t + h] such that &_(t, h,a) > and [t, t + h] with 6+(t, h, a) < 0, then, with confidence 
1 — a, op{p)f has a zero in the interval [min(t, t), max(t + h,t -\- /i)] . The maximal number 
of disjoint intervals on which we find zeros is then an estimator for the number of roots. 

Example 1. Suppose op{p) = D. In this case we want to find a collection of intervals \t,t + 
K\ such that with overall probability 1 — a for each such interval there exists a nondegenerate 
subinterval on which f is strictly nionotonically increasing. 

To state it differently, suppose that f is continuous and (p > is a kernel with support on 
[0,1], i.e. </> > with L (f){x)dx = 1. If J (pt,h{x)f'{x)dx > 0, then there is a nondegenerate 
subinterval of [t,t + h] on which f > 0. In particular, we can reject the null hypothesis that 
/' < on [t,t + h] at level 1 — a. More generally, j (j)t,h{x) f {x)dx G [a,b] implies by the 
intermediate value theorem that the graph of f intersects the rectangle [t, t+h] x [ah~^, bh~^] 
in at least one point. 

Example 2. Suppose that we want to analyze the convexity /concavity properties of U = 
q{X) (for instance U = e ), where q is a function, which is strictly monotone increasing 
on the support of the distribution of X . Let fjj denote the density of U . Then, by change 
of variables 

fu{y) = -n — 77^/(9^^(2/)), 
q[q'^{v)) 

and there is a pseudo- differential operator Op(p) with symbol 

, ,^ 1 ,2 4'{x)ci{x) + 2ci'{x) .^ ?,{q"{x))^ - q"'{x)q'{x) 

(g'(x))2^ (g'(x))4 '^^ (g'(x))5 

such that f'{j{y) = {op{p)f){q'^{y)). Therefore, 

graph(op(p)/) n [t,t + /i] X [b-{t,h,a),b+{t,h,a)] ^ 
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implies 

graph(/^) n [q{t), q{t + h)] x [6_(t, h, a), 6+(f , /i, a)] ^ 0. 

/n particular, if b^{t,h,a) > i/ien, mi/z confidence 1 — a, we may conclude that fjj is 
strictly convex on a nondegenerate subinterval of [q{t), q{t + h)]. 

Example 3 (Noisy Wiksell problem). In the classical Wiksell problem, cross- sections of 
planes with randomly distributed balls in three-dimensional space are observed. From these 
observations the distribution H or density h = H' of the squared radii of the balls has to be 
estimated (cf. Groeneboom and Jongbloed 122]). Statistically speaking, we have observations 
Xi, . . . ,Xn with density f satisfying the following relationship (cf. Golubev and Levit ^21]) 

1 - H{x) a j^ U-%^ '^^ ^ T{\){l]!^f){x), for all x G [0,oo), 

1/2 \ 1 

where oc means up to a positive constant and /„ as in (3.2). Suppose now, that we are 



interested in monotonicity properties of the density h = H' on [0, 1]. For x > 0, —h' ^ 
iff the fractional derivative of order 3/2 satisfies {D_ f){x) = D^{I_ f){x) ^ 0. It is 
reasonable to assume in applications that the observations are corrupted by measurement 



errors, which means we only observe Yi = Xi -{- ei, as in model (1.1). This means we 



are in the framework described above and the shape constraint is given by op(p)/ ^ for 
'7 



Pix,o = if^'\e/'- 



In order to formulate our results in a proper way, let us introduce the following definitions. 



We say that a pseudo-differential operator Op(a) with a G 5™ and S"" as in (3.1 ), is elliptic, 
if there exists ^Oi such that |a(j;,.^)| > Er|^|™ for a positive constant K and all (, satisfying 
1^1 > 1^0 1- For instance in the framework of Example [21 ellipticity holds if H^'Hoo < oo. 
Furthermore, for an arbitrary symbol p G S^ let us denote by Op{p*) the adjoint of Op(p) 
with respect to the inner product (•,•). This is again a pseudo-differential operator and 
p* G S"". Formally, we can compute p* by p*{x,^) = e '^^p{x,^), where p denotes the 
complex conjugate of p. Here the equality holds in the sense of asymptotic summation (for 
a precise statement see Theorem 18.1.7 in Hormander |25j). Now, suppose that we have a 
symbol in S_"^ of the form a|^|'''it' = a(x, ^)|^|'^tt' with a G 5™ and m + 7 = m. Since for 
any u,v G H^, 

{op{a\^rLl)u,v) = (Op(a)op(|er^^)n,«) = (op(|er^^)n, Op(a'^)t-) 

= {u,op{\^ri-'')Op{a*)v) (3.4) 

we conclude that T{op{a\CpLp*(l)) = \^pq^T{Op{a*)(P) for all G H"". 

In order to formulate the assumptions and the main result, let us fix one symbol p G SJ^ 
and one factorization p{x, ^) = a{x, S,)\C\'^l'^ with a, 7, fi as in the definition of S"". 
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Assumption 2. We assume that there is a positive real number r > and constants 
< Ci < Cu < oo such that the characteristic function of e is hounded from below and 
above by 

Ci{s)-^ < \Ee-'''\ = \Hfe){s)\ < Cu{s)-' for all s G M. 

Moreover, suppose that the second derivative of J-{fe) exists and 

{s)\DF{f,){s)\ + {s)^\D^F{f,){s)\ < Cu{sr'' for all s G M. 

These are the classical assumptions on the decay of the Fourier transform of the error 
density in the moderately ill-posed case (cf. Assumptions (Gl) and (G3) in Fan [16j). 
Heuristically, we can think of F{fe) as an elliptic symbol in S'"'". 

Let Re denote the projection on the real part. For sufficiently smooth (p, consider the test 
statistic 



with 



and 



^ n ^ n 

Tt,h := ^ VRe vt,h{Yk) = ^ VRe vt,h{G-^ 
Jn ^-^ Jn ^-^ 

^ k=\ ^ fc=i 



vt,h = T-\\^^{:)T[0v(.a'){4>oSt,h))) 



(Uk)) 



m = A^(s) 



I I'Y ~M 

I -^1 f^s 



(3.5) 



(3.6) 



(3.7) 



From ([L3|) and (|3^, we find that for f e H"", 

^Tt,h = ^ [{^oSt,h){x)Re{opip)f){x)dx. 



Proceeding as in Section [2] we consider the multiscale statistic 

^\Tt,h-nTt,h]\ — 



sup Wh 



M ^^ / i I / 

{t,h)£B„ \V9n{t) \\vt,h\ 



2 log I 



(3. 



i.e. with the notation of (2.5), we set ipt,h '■= Refi,/i and Vt^h '■= \\vt,h\\2- Define further 

' fRevt,h{s)dWs\ 



T^{W):= sup Wh 

{t,h)&Bn 



\vt,hh 



21og^ 



Theorem 2. Given an operator op{p) with symbol p € SJ^ and let T^ be as in (3.8). Work 
model (1.1) under Assumption^ Suppose that 



m 
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(i) Innlog n — )• oo and Un = o(log n), 
(ii) (p e /7^Lr+m+5/2j ^ supp (/) C [0, 1], and TV(L'L'-+™+5/2j^) ^ ^^ 

fiiij Op(a) is elliptic. 
Then, there exists a (two-sided) standard Brownian motion W, such that for u > e, 



sup 

G&Gc,C,q 



Tn-T^iW) =opirn), (3.9) 



with 



II- II log™ , 7-1/2 -1/2 log^^^™ , 1/2 1 3/2 
rn = sup \\gn - g\\-. -, + ln ' ^ ' + V log ^ "■ 

Ggg " "°^loglogn log log n 



Moreover, 



sup Wh\— I ^-^/21og^) < oo, a.s. (3.10) 

{t,h)eT V \\^t,hh 



Hence, the approximating statistic T^{W) is almost surely bounded from above by (3.10) 



One can easily show using Lemma C.l that if B^ contains (2.8 ) and the symbol p does not 
depend on t, then, the approximating statistic is also bounded from below. Furthermore, 
the case e = can be treated as well (we can define J'{fe) = 1 in this case). In particular, 
our framework allows for the important case e = and op(p) the identity operator, which 
cannot be treated with the results from 1131. 



For special choices of p and /^ the functions {vt^h)t,h have a much simpler form, which allows 
to read off the ill-posedness of the problem from the index of the pseudo-differential operator 
associated with vt,h- Let us shortly discuss this. Suppose Assumption [2] holds and addition- 
ally {s)^\D^F{f,){s)\ < Cfc(s)-"' for ah s G M and A: = 3,4, . . . Then (x,0 ^ F{f,){-i) de- 
fines a symbol in S^'' . Because of the lower bound in Assumption[2| Ci{£)~''' < |-7^(/e)(— 01' 
the corresponding pseudo-differential operator is elliptic and (x,^ i— ;■ l/J-'(/e)(— ^ is the 
symbol of a parametrix and consequently an element in S'^ (cf. Hormander [25], Theorem 
18.1.9). If (/. G F'^+"^ and p € 5" n S'^, then 

vt,h[u) = 1- j F{Ov{y^j^) oOp{p*){cl>o St,H)){')e'''''ds 

= Op {tU7)^) ° Ov{p'){(t)oSt,h){u). 

Pseudo-differential operators are closed under composition. More precisely, pj S 5""^', 
j = 1,2 implies that the symbol of the composed operator is in ^'"i+'^a Therefore, there 
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is a symbol p G 5"«+^ such that vt^h = Op{p){(j) ° St^h)- Hence, for fixed h, the function 
1 1— ^ vt,h can be viewed as a kernel estimator with bandwidth h. Furthermore, the problem 
is completely determined by the composition Op(p) and this yields a heuristic argument 
why (as it will turn out later) the ill-posedness of the detection problem Reop(p)/ ^ in 



model (1.1) is determined by the sum m + r, i.e. 

ill-posedness of the shape constraint + ill-posedness of the deconvolution problem. 

Suppose further that r and m are integers and Op{p) is a differential operator of the form 

m 
Y,ak{x)D^ (3.11) 



fc=i 



with smooth functions a^ k = l,...,m and a^ bounded uniformly away from zero. If 
l/J-"(/e)(— •) is a polynomial of degree r (which is true for instance if e is Exponential, 



Laplace or Gamma distributed) then Op(pi) is again of the form (3.11 ) but with degree m+r 
and hence Vt^h{u) is essentially a linear combination of derivatives of (p evaluated at {u—t)/h. 
However, these assumptions on the error density are far to restrictive. In the following 
paragraph we will show that even under more general conditions the approximating statistic 
has a very simple form. 

Principal symbol. In order to perform our test, it is necessary to compute quantiles of 
the approximating statistic in Theorem[2j Since the approximating statistic has a relatively 
complex structure let us give conditions under which it can be simplified considerably. First, 
we impose a condition on the asymptotic behavior of the Fourier transform of the errors. 
Similar conditions have been studied by Fan [TB] and Bissantz et al. [S|. Recall that for 
any a,a gR, s / 0, -D/-"|s|" = D(is)"i(-is)"2 = aii^-^|s|"-^ with oi = (a + a)/2 and 
02 = (a — a)/2. 

Assumption 3. Suppose that there exists /3q > 1/2, p G [0,4), and positive numbers A,Cf,, 
such that 

\ALp\s\''T{f,){s) - 1| + \Ar-hiP+^\s\''+^DT{f,){s) - l| < Ce(s)-*, for all s £ R. 

Assumption 4. Given m = {0} U [1, oo) suppose there exists a decomposition p = pp +Pr 
such that PR G SJ^ for some m' < m, and 

pp{x, = ap(x)|er^^, for all x,^eR, 

with (x,^) I— >■ ap{x) G S^, ap real-valued and \ap{-)\ > 0. 
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For s 7^ 0, ig = —1. Assume that in the special case m = we have |p+;u| < r. Then, we can 
(and will) always choose p and /i in Assumptions p^ and H^ such that a = (r + tti + p + /i)/2 
and r = {r -\- m — p — p)/2 are non-negative. The symbol pp is called principal symbol. 
We will see that, together with the characteristics from the error density, it completely 
determines the asymptotics. The condition basically means that there is a smooth function 
6, such that the highest order of the pseudo-differential operator coincides with ap{x)D"^. 
Note that principal symbols are usually defined in a slightly more general sense, however 
Assumption |4] turns out to be appropriate for our purposes. 

In the following, we investigate the approximation of the multiscale test statistic 

r^:= sup Whi , ^^^ ^r'^^-A/21ogr , 3.12 

by 

Theorems. Work under Assumptions\^\^and\^ Suppose further, that 

(i) Inulog-^n^ oo and u„ = o(log~(=^'^('"-"'')"') n), 
(ii) (j) e _^|'-+™+5/2j ^ suppc/) C [0, 1], and TV(L'Lr+™+5/2j^) < ^^ 

(in) If m = assume that r > 1/2 and \p + p\ < r. 
Then, there exists a (two-sided) standard Brownian motion W, such that for u > e, 

sup r„^-r„^.-(w) =op(i), 

and the approximating statistic T„'°°(VF) is almost surely hounded from above by 



s^P ^W ' ^ mr+:nl).-tL ' - \/21og^ < OO, a.s. (3.13) 



4 Confidence statements 

4.1 Confidence rectangles 

Suppose that Theorem [2] holds. The distribution of T^{W) depends only on known quan- 



tities. By ignoring the op(l) term on the right hand side of (3.9), we can therefore simulate 
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the distribution of T„. To formulate it differently, the distance between the (1 — a)-quantiles 
of Tn and T^{W) tends asymptotically to zero, although T^{W) does not need to have a 
weak limit. The (1 — Q)-quantile of T^{W) will be denoted by qa{T^ (W)) in the sequel. 

In order to obtain a confidence band one has to control the bias which requires a Holder 
condition on op{p)f. However, since we are more interested in a qualitative analysis, it 
suffices to assume that op{p)f is continuous (and / G H"^ in order to define the scalar 
product of op(p)/ properly). Moreover, instead of a moment condition on the kernel (p, we 
require positivity, i.e. for the remaining part of this work, let us assume that (j) > and 
f (j){u)du = 1. Therefore, we can conclude that asymptotically with probability 1 — a, for 
all {t,h) £ Bn, 

\'Tt^h — dt,h Tt^h + dt,h' 



,h,ov{p)f) G 



n 



n 



(4.1) 



where 



d-. 



t,h '■- 



\/?n(i)| 



VtM 



2iog^ i + g„(r-(VF)) 



log log J, 
log^ 



Using the continuity of op(p)/, it follows that asymptotically with confidence 1 — a, for 
all (t, h) G Bn, the graph of x i— t- op(p)/(a;) has a non-empty intersection with each of the 
rectangles 

'Tt^h — dt^h Tt^h + dtX 



[t, t + h] 



h\/n 



hy/n 



This means we find a solution of {iii) by setting 

Tth — dfh 



b-{t, h, a) :- 



h\/n 



b^{t, h,a) 



Tt,h + dt^h 
h\/n 



(4.2) 



(4.3) 



If instead Theorem [3] holds, we obtain by similar arguments that asymptotically with con- 
fidence 1 — a, for all (t, h) G Bn-, the graph of x i— )• op(|j)/(x) has a non-empty intersection 
with each of the rectangles 



[t, t + h] 



h^/n ' h^/n 



with 



<h := VdJt)\Aap{t)\hy^-"'-^-\\Dl+'''cl, 

■^P,00 /TTT\\ ,1 -I .-i r rnP,00 



21og^ 1 + (?„(T„^'-(1^)) 



.log log ^ 
log^ 



(4.4) 

(4.5) 



and qa{Tn ' (W)) the 1 — a-quantile of T„, ' (W). Therefore we find a solution with 



b-{t, h, a) :- 



dL 



6+(t, h,a) :- 



Tt,h + dP 



h^/n hy/n 

Finally let us mention that instead of rectangles we can also cover op(p)/ by ellipses. Note 
that in particular a rectangle is an ellipse with respect to the || • ||oo vector norm on M^, i.e. 
(up to translation) a set of the form {(xi,X2) : max(a|xi|, 6|2;2|) = 1} for positive a, 6. 
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R' 




R, 











Figure 1: If the graph of op(p)/ intersects Ri and R2, then also R (left). If graph(op(p)/) 
intersects R and Ri, then also R' (right). 



4.2 Structure on confidence rectangles 



For any (t, h) G i?„ the multiscale method returns a rectangle of the form (4.2) (or (4.4)). 



However, most of the rectangles are redundant since the fact that graph(op(p)/) intersects 
these rectangles can be deduced already from the position of other rectangles (see for in- 
stance Figure[l]) and the assumption that op(p)/ is continuous. Naturally, we are interested 
in the set of rectangles, which are informative in the sense that they contain information on 
the signal, which cannot be deduced from other rectangles. Let us describe in three steps 
(A), (B), (B'), how to discard redundant rectangles. 

(A) Fix {t, h) G Bn- Suppose there exists (ti, /ii), (^2, /12) S Bn ((ti, hi) and (^2, ^2) not nec- 
essarily different) such that [ti, ti + /ii], [t2i ^2 + ^2] C [t, t + /i], 6+(ti,/ii,a) < b+{t,h,a) and 
b-{t2,h2,a) > b-.{t,h,a). Denote by R,Ri,R2 the rectangle obtained from {t,h),{ti,hi) 
and (^2)^2); respectively (for an illustration see Figure IT]). Since op{p)f is further as- 
sumed to be continuous, then by intermediate value theorem, graph(op(j?)/) CiRi ^ and 
graph(op(p)/) n R2 ^ imply that graph(op(p)/) Ci R ^ 0. Hence, in this case, R is 
non-informative and will be discarded. 

(B) Fix (t, h) G Bn and denote the induced rectangle by R. Suppose there exists (fi, ^1) G 
Bn, such that [ti,ti + hi] C [t,t + h] and b^{ti,hi,a) < b^{t,h,a) < 6+(ti, /ii, a) < 
6+(t,/i, a) (see Figure [I|. Define R' := [t,t + h] x [6_(t, /i, a), 6+(ti, /ii, a)]. Then, R' is 
contained in R and graph(op(p)/) Ci R' ^ 0. Therefore, we replace R by R' . 

{B'): Same as (B), but consider the case 6_(t, /i, a) < b-{ti,hi,a) < b+{t,h,a) < 6+(ti, /ii,a) 
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With i?' := [t,t+h]x[b^{ti,hi,a),b-^-{t,h,a)] we obtain graph(op(p)/)ni?' / 0. Therefore, 
we replace Rhy R' . 

Throughout the fohowing, let us refer to the remaining rectangles after application of 
(A), (B) and (B') as (set of) minimal rectangles. 



4.3 Comparison with confidence bands 

Let us shortly comment on the relation between confidence rectangles and confidence bands. 
Fix one scale h = h^ and consider i?„ = [0, 1] x {h}. For simplicity let us further restrict 
to the framework of Theorem ^ From ( |4.1[ ) , we obtain that 

'Tt^h — dt^h Tt^h + dt^h' 



iH> 



h\/n 



h\/n 



(4.6) 



is a uniform (1 — a)-confidence band for the locally averaged function 1 1— t- \{4>t,hiO'g>{p)f). 
Restricting to scales on which the stochastic error dominates the bias | op{p) f — -^^{ipt^hj op{p) f)\ 



(for instance by slightly undersmoothing) we can, inflating (4.6) by a small amount, eas- 
ily construct asymptotic confidence bands for op(p)/ as well. Note that Theorem^ does 
not require that s'^J-'{fe){s) converges to a constant and therefore we can construct confi- 
dence bands for situations which are not covered within the framework of [5] . For adaptive 
confidence bands in density deconvolution see the recent work by Lounici and Nickl |35j . 
However, the construction of confidence bands described above will not work on scales 
where we oversmooth or if bias and stochas- 
tic error are of the same order. The strength 
of the multiscale approach lies in the fact that 
for confidence rectangles all scales can be used 
simultaneously. This allows for another view 
on confidence rectangles. The figure on the 



right displays a band (4.6) computed for a large 



scale/bandwidth which obviously does not cover 



op(p)/- Now, take a point, to say, then (4.2) 



is equivalent to the existence of a point tg G 
[tQ,to+h] such that the confidence interval [A, B] 
at to shifted to tg contains op(p)/(tQ). Thus, 
confidence rectangles also account for the un- 
certainty of 1 1— ). op(p)/(t) along the t-axis. 




Figure: Obtaining confidence 
rectangles from bands. 



5 Choice of kernel and performance of the multiscale statis- 
tic 

In this section, we investigate the size/area of the rectangles constructed in the previous 
paragraphs. Recall that by ( |1.3[ ) the expectation of the statistic Tt^ depends in general 
on op(p). In contrast, Theorem [s] shows that the variance of Tt^h depends asymptotically 
only on the principal symbol, which acts on </) as a differentiation operator of order m + r. 
Therefore, the m + r-th derivative of (p appears in the approximating statistic Tn' (W), 
but no other derivative does. In fact, we shall see in this section that the scaling property of 
the confidence rectangles can be compared to the convergence rates appearing in estimation 
of the (m + r)-th derivative of a density. 

5.1 Optimal choice of the kernel 

In the following, we are going to study the problem of finding the optimal function (p. 
If m + r G N and the confidence statements are formulated based on the conclusions of 
Theorem [3] this can be done explicitly. 



Note that for given {t,h) G Bn, the width of the rectangle (4.4) is given by 2(i^^/(/i-y/n). 
Further, the choice of 4> influences the value of d^^ in two ways, namely by the factor 
||i:);+"^0||2 = ||D^+™0||2 as weh as the quantile qaiTn'^^iW)) (cf. the definition of df,^ 



given in (4.5)). Since a is fixed, we have 



.P,oo/w^^^°gl°g^ 



& h 

Therefore, d^^ depends in first order on !)''+'"</> and our optimization problem can be 
reformulated as 



minimize ||Z?'''''™'(/)|L, subject to / 



(j){u)du = 1. 



This is in fact easy to solve if we additionally assume that (p G W^ with r + m < q < 
r + m + 1/2. By Lagrange calculus, we find that on (0,1), (p has to be a polynomial 
of order 2m + 2r. Under the induced boundary conditions (p^^'{()) = (p^^'{l) = for 
A: = 0, ...,r + m — 1, the solution (pm+r has the form 



C-m+rX 



m+rf-i r^\m+r 



l-x)-+^(o,i)(x). (5.1) 



Due to the normalization constraint j (pm+r{u)du = 1, it follows that (pm+r is the density 
of a beta distributed random variable with parameters a = m + r + 1 and /3 = ttt, + r + 1, 
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implying, Cm+r = (2rn- + 2r + l)!/((?7i + r)!)^. It is worth mentioning that (j))^^J' , restricted 
to the domain [—1,1), is (up to translation/scahng) the {m + r)-th Legendre polynomial 

,(m+r)_. +, (2m + 2r + l)! 

(this is essentially Rodrigues' representation, cf. Abramowitz and Stegun |JJ, p. 785). For 
that reason, we even can compute 

II (m+r) II (2m + 2r)! /- , „ , , 

11'^-+- Wl^ = (m + ry. V2m + 2r+l. 

In the particular case r = 0, m = 1 we obtain (p\ (x) oc 1 — 2x and this is known from the 
work of Diimbgen and Walther |13j . where the authors use locally most powerful tests to 
derive (j)\ . 

To summarize, we can find the "optimal" kernel but it turns out that it has less smoothness 
than it is required by the conditions for Theorem [3] due to its behavior on the boundaries 
{0, 1}. However, if the multiplicative inverse of the characteristic function of the noise 
density can be written as a polynomial, we were able to prove the theorems under weaker 
assumptions on ^ including as a special case the optimal beta kernels. 

5.2 Performance of the method 

In this part, we give some theoretical insights. We start by investigating Problem {iii) (cf. 
Section [3]). After that, we will address issues related to (ii) and {%). It is easy to see that 
Ibt.hlb ^ /ji/2-m-r' ^^^ thus, dt^h and d^^ are of the same order. We can therefore restrict 
ourselves in the following to the situation, where the confidence statements are constructed 
based on the approximation in Theorem [2} In the other case, similar results can be derived. 

Problem (iii): Recall that with confidence 1 — a, for all (t, h) £ Bn, 

'Tt^h — dt,h Tt^h + dt,h~ 



graph(op(p)/) n [t,t + h] 



hy/n ' hy/n 



/0. 



The so constructed rectangles contain information on op{p)f, where the amount of infor- 
mation is directly linked to the size of the rectangle. Therefore, it is natural to think of the 
area and the length of the diagonal as measures of localization quality. For the rectangle 
above, the area is given by 



avea{t,h) := 2dt,hn~^''^ ~ /ii/2-m-r^-i/2 A ^ 1 
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There is an interesting transition: Suppose that m + r < 1 (this includes for instance 
monotonicity in the direct case and exponential deconvolution) . Then, area(t, /i) — )• 
uniformly in h £ [/„,u„]. In contrast, whenever m + r > 1, 

h > (logn/n)V(2™+2r-i) ^ area(t, h) -^ 0, 

h ~ (logn/n)V(2m+2r-l) ^ ^^gj^^^^ ;^) ^ q^;^)^ 

/l < (logn/n)V(2m+2r-l) ^ ^^^^^^^^^ ;^) ^ ^_ 

On the other hand, the length of the diagonal behaves like h V /i~™'~^~^'^n~^'^iyiogl/7i. 
If the rectangle is a square, then, h ~ (logn/n)^'''^"''^™"*"^''^ 

Problem (ii), (ii'): The following lemma gives a necessary condition in order to solve (ii). 
Loosely speaking, it states that whenever 

the multiscale test returns a rectangle [t,t + h] x [b-{t, h, a) ,bj^{t, h, a)] which is in the 
upper half-plane with high-probability. Or, to state it differently, we can reject that 

Theorem 4. Work under the assumptions of Theorem^ Suppose that (/) > 0. Let M~ 
denote the set of tupels (t, h) G Bn for which 



Similar, define M+ := {(t,/i) G Bn \ op{p)f\[t^t+h] < -{2dt^h)/{hyjn)}. Then, ifb+{t,h,a) 



and b-{t,h,a) are given by (4.3), we obtain 



lim F({-l)^b±{t,h,a) > 0, for all {t,h) e M^) > 1 -q 



Proof. For all {t,h) G M„ , conditionally on the event given by (4.1), 



op(p)/|r.., .1 > 7-^ => (0i,/i,op(p)/) > ^^ => Tt,h>dt,h => b-{t,h,a) > 0. 



Similar, one can argue for M+. D 

In order to formulate the next result, let us define 
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Corollary 1. Work under the assumptions of Theorem^ Suppose that (f) > and /3 G M. 
Let M~ denote the set of tupels (t, h) £ Bn satisfying 

op(^)/|[,m] > (^J (5-3) 

and 

/l \ l/(2/3+2m+2r+l) 

h>cJ'^^ 

V n 



LetM+ he as M' , with ([53]) replaced hy oY>{p)f\^t,t+h] < -(logn/n)/^/(2/3+2™+2r+i)_ ^^/^g^^ 



if b-{t,h,a) and b+{t,h,a) are given by (4.3), we obtain 

lim pf(-l)=F6±(i,/i,a) > 0, /or a// {t,h) G M^) > 1 - q. 

n— >oo \ / 

Proof. It holds that 

dt,h<\\fe\\'^^\\vt,h\\^^/2\ogj7/h{l + q^{T^{W))). 
For sufficiently large n, h > In > v/n. Therefore, we have for every (t, h) G Af~, 



2di,/^ ^ 



Similar for M^. Now, the result follows by applying Theorem El D 



The last result shows essentially that if op(p)/|r^^ ^, is positive, precisely, op(p)/|,^^ ^, ~ 
(logn/n)^/(2/3+2™+2r+i)^ and if /i ~ (logn/n)i/(2'/3+2m+2r+i)^ ^j^gj^ ^-^j^ probability 'l - a, 

our method returns a rectangle in the upper half-plane. Another way to guarantee this is 
by imposing the condition 

ov{p)f\,^,^>h^- (5.4) 

We have three distinct regimes 

/3>0: op(p)/|j^^^^^j^O /i^O, 

/3 = 0: op(p)/|[^^^^^] = 0(l) /i~(logn/n)V{2-+2r+i)^0, 

-m-r-l/2</3<0: op(p)/|j^^j^^j ^ oo /i ^ 0. 

It is insightful to compare the previous result to derivative estimation of a density if m + r 
is a positive integer. As it is well known, D'^+^ f can be estimated with rate of convergence 

/logn\/3/{2/3+2m+2r+l) 
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under L°°-nsk assuming that op{p)f is Holder continuous with index /3 > and that 
h ~ (logn/n)^'^'^/^^'^"^^'^^'^^' . This directly relates to the first case considered above. 

Problem (i): At the beginning of Section^ we shortly addressed construction of confidence 
statements for the number of roots and their location. Note that estimators derived in this 
way, have many interesting features. On the one hand, we know that with probability 1 — a 
the estimated number of roots is a lower bound for the true number of roots. Therefore, 
these estimates do not come from a trade-off between bias and variance but they allow for 
a clear control on the probability to observe artefacts. It is worth mentioning that for this 
proper qualitative feature selection no additional penalization is required. In order to show 
that the lower bound for the number of roots is not trivial, we need to prove that whenever 
two roots are well-separated (for instance the distance between them shrinks not too fast), 
they will be detected eventually by our test. This property follows if we can show that the 
simultaneous confidence intervals for a fixed number of roots, say, shrink to zero. 

Therefore, assume for simplicity that the number K and the locations (a;o,j)j=i,...,i<' of the 
zeros of op{p)f are fixed (but unknown) and xqj £ (0, 1) for j = 1, . . . ,K. For example, 
these roots can be extreme/saddle points if op(p) = D oi points of inflection if op(p) = D^. 

In order to formulate the result, we need that Bn is sufficiently rich. Therefore, we assume 
that for all n, there exists a sequence {Nn), Nn > ^i/('2m+2r+i) iQg4 ^^^ such that 

i I 



Nn Nn 



A; = 0, 1, ... , 1 = 1,2,..., k + l<Nn} CBn. 

Assume further that in a neighborhood of the roots xqj, op{p)f behaves like 
op(p)/(x) = 7sign(x - xoj)\x - xojf + o{\x - xojf), 

for some positive /3 E (0, 1]. Let p„ = (logn/n)V(2/3+2m+2r+i)2/^i//3 ^nd Ca,M^ as defined 
in Corollary 1 There exist integer sequences {kj^)j^n, {kt^j,n-, {ln)n such that for all 
sufficiently large n, 

k~ k~^ I 

Pn< -^-xo,j <2pn, -2pn< -^-XQj < -pn, and Ca7^/''p„ < -^ < 2Co7^/^p„. 

Some calculations show that {kJ^/Nn,ln/Nn) G M^ and ((A:^„ — ln)/Nn,ln/Nn) G M+ for 
j = 1, . . . ,K. We can conclude from Corollary [T] and the construction, that for j = 1, . . . ,K, 
the confidence intervals have to be a subinterval of 

^j> ~ '"IT- ^j,n + '""' 

Nr.. ' Nr,. 



23 



Hence, the length for each confidence interval is bounded from above by 

l/(2/3+2m+2r+l) 

4(^7'/'' + IK 



n 

As n — 7- oo the confidence intervals shrink to zero, and will therefore become disjoint 
eventually. This shows that our estimator for the number of roots picks asymptotically the 
correct number with high probability. Observe, that for localization of modes in density 
estimation {m,r,l3) = (1,0,1) the rate (logn/n)^'^ is indeed optimal up to the log-factor 
(cf. Hasminskii [23]). The rate (log n/n)^" for localization of inflection points in density 
estimation (m, r, /3) = (2,0, 1) coincides with the one found in Davis et al. [9]. 

For the special case of mode estimation in density deconvolution let us shortly comment on 
related work by Rachdi and Sabre |j3^ and Wieczorek [38]. In [38] optimal estimation of 
the mode under relatively restrictive conditions on the smoothness of / is considered. In 
contrast, Rachdi and Sabre find the same rates of convergence n~^'^'^^^^> (but with respect 
to the mean-square error). Under the stronger assumption that D^ f exists they also provide 
confidence bands which converge at a different rate, of course. 

5.3 On calibration of multiscale statistics 

Let us shortly comment on the type of multiscale statistic, derived in Theorems [T](3) Fol- 



lowing |12j . p. 139, we can view the calibration of the multiscale statistics (2.5), (3.8), and 



(3.12) as a generalization of Levy's modulus of continuity. In fact, the supremum is at- 
tained uniformly over different scales, making this calibration in particular attractive for 
construction of adaptive methods. 

One of the restrictions of our method, compared to other works on multiscale statistics, 
is that we exclude the coarsest scales, i.e. h > Un = o(l) (cf. Theorem [I|. Otherwise the 
approximating statistic would not be distribution- free. However, excluding the coarsest 
scales is a very weak restriction since the important features of op(p)/ can be already 
detected at scales tending to zero with a certain rate. For instance in view of Corollary [T| 
the multiscale method detects a deviation from zero, i.e. o\){p)f\^ > C > 0, provided the 
length of the interval / is larger than const, x (log n/n)^' ^^'"'^^'''^^^ This can be also seen by 
numerical simulations, as outlined in the next section. 



6 Numerical simulations 

We will illustrate our method by investigating monotonicity of / (op(p) = D, cf. Example 
[ij) under Laplace-deconvolution, i.e. fe{x) = 9~^e~'^''^ with 9 = 0.075. In this case, we 
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Figure 2: Boxplots for three different values (n = 200, n = 1000, n = 10.000) of the 



approximating statistic (6.1) 



find 



Timt) = {et)-' and opipyf = -Df 



and the statistic (3.5) takes the explicit form 

g2 



U,/i 






h\/n ^-^ V /i^ 
k=\ 



^(3) 



Yk-t 



, (Yk-t 



As kernel i?i), we select the density of a Beta(4, 4) random variable (cf. Section^. Moreover, 
we choose Un = 1 /log log n for the multiscale statistic and define 



Bn 



k I 



\n 



0.61 



fc = 0,l,..., Z = l,2,...,[iV„u„], A; + /<l}, for 7V„ = [r 

(6*2, 0,2, 2) and (/i,m) = (1,1), 



Note that Assumptions p^ and U^ hold for {A, p, r, Pq) 
respectively. Then, the multiscale statistics 



sup Wh /.^jTSn2 



21og(^) 



and 



-iP.OO 



(W) = sup Wh 

it,h)eB„ 






21og(0 



(6.1) 



have a particular simple form. 



Boxplots for the distributions T2q^{W), T^f^^{W), and T'^o ooo(^) ^^^ displayed in Figure 
[2] for 10.000 simulations each. These plots show that the distribution is well-localized with 
only a few outliers. As proved, the approximating statistic is almost surely bounded as 
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Figure 3: Simulation for sample size n = 1000 and 90%-quantile. Upper display: True 
density / (dashed) and convoluted density g (solid). Lower display: Line plot of the 
endpoints of intervals solving Problems (ii) and {ii') as well as minimal solutions to (ii) 
and {ii') (horizontal lines above/below) 



n — >• CO. For increasing sample size, however, Figure [2] indicates, that the quantiles of the 
distributions Tn '°° (W) increase slowly. 

In Figures [3] and |4j we give an example of a reconstruction based on a sample size of 
n = 1000 and confidence level equal to 90%. Based on 10.000 repetitions, the estimated 
quantile is qo.iiTiQ^{W)) = —0.41. For the simulation, we use z/ = exp(e^) because then, 
h I—)- Y^log v/h/ilog log v/h) is monotone as long as < /i < 1 (cf. Lemma 



C.3 



(i))- 



The upper display of Figure [3] shows the true density of / as well as the convoluted den- 
sity g. Note that g is very smooth and as the other densities non-observable (we only 
have observations, which are distributed with density g). In fact, by visual inspection of 
g, it becomes apparent how difficult it is to find the intervals on which / is monotone 
increasing/decreasing. 

The lower plot of Figure [3j displays minimal intervals which are solutions to Problems 
{ii) and {ii') (horizontal lines above and below the line plot, respectively). Here, minimal 
intervals for {ii) and {ii') denote the intervals for which no proper subinterval exists with the 
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0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

Figure 4: True (unobserved) derivative /' and minimal rectangles (left) as well as sparse 
minimal rectangles/ midpoints (right) for the same data set as in Figure^ 



same property. The line plot itself depicts the endpoints of all intervals belonging to (ii) and 
{ii'). Note that the possible values for the endpoints are given by k/Nn, k = 0,1, . . . ,Nn. 
If for given k there is more than one interval solving (ii) or {ii') with endpoint k/Nn the 
line width is increased accordingly. For more on this type of plotting, see Diimbgen and 
Walther 113]. 



The density / has been designed in order to investigate Corollary [T] numerically. Indeed, 
on [0,0.35], the signal (in this case |/'|) is in average large but the intervals on which / 
increases/decreases are comparably small. In contrast, on [0.35,1], |/'| is small and there 
is only one increase/decrease. 

The test is able to find two regions of increase and two regions, where the density decreases. 
The increase and decrease on the leftmost position are not detected by our test. Repetition 
of the simulation shows that the decrease on the intervals [0.25, 0.35] and [0.55, 1] is most of 
the time found while the increases (on [0.17,0.25] and [0.35,0.55]) are less often detected. 
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Furthermore, compared to the true function /, it can be seen that the difficulty lies in 
precise localization of the regions of increase/decrease. 

In Figure |4j the derivative of / as well as the minimal rectangles, additionally satisfying 
either b-{t,h,a) > or b^{t,h,a) < 0, are displayed. For better visualization, we have 
depicted the midpoints of these rectangles and a sparse subset (right display in Figure |4]) 
using the following reduction step: 

(C): Let R be the rectangle with the smallest area and denote by S the set of rectangles 
having non-empty intersection with R. Find the rectangle in S minimizing the area of 
intersection with R. Display R and R' and discard R and all the rectangles in S. If there 
are rectangles left, start from the beginning. 

By construction, we find as before two regions of increase and decrease. Compared to the 
multiscale solutions of Problems (ii) and {ii') (cf. Figure pi), we also obtain surprisingly 
precise information on the derivative of /. Observe that the graph of /' tends to cut the 
rectangles through the middle. Therefore, the midpoints of the rectangles (depicted as 
crosses in Figure El) can be used for instance for estimation of maxima. 

Figure |4] also shows nicely why a multiscale approach can provide additional insight com- 
pared to a one-scale method. Consider the rectangles i?i and R2 in the right display of 
Figure [4] and denote by (ii,/ii) and {t2,h2) the corresponding indices in Bn (as in (4.4)). 



Note, Ri and R2 belong to more or less the same value in the time domain, i.e. ti ~ t2 
but different bandwidths /ii,/i2- Therefore, we may view i?i and R2 as a superposition 
of confidence statements on different scales. Since Ri yields the better resolution in the 
t-coordinate and R2 the better resolution in the y-coordinate, different qualitative state- 
ments can be inferred at the same time point. More practical, we would use R2 in order to 
construct a confidence statement as in the lower display of Figure [3] and from Ri we obtain 
the better bound for inf /'. This would be impossible for any one-scale method. 

7 Outlook and Discussion 

We have investigated multiscale methods in order to analyze shape constraints expressed 
as pseudo-differential operator inequalities in deconvolution models. Compared to previous 
work, a more refined multiscale calibration has been considered using an idea of proof based 
on KMT results together with tools from the theory of pseudo-differential operators. We 
believe that the same strategy can be applied to a variety of other problems. In particular, it 
is to be expected that similar results will hold for regression and spectral density estimation. 

Our multiscale approach allows us to identify intervals such that for given significance level 

28 



we know that op(j))/ > at least on a subinterval. As outlined in Section^ these results 
allow for qualitative inference as for example construction of confidence bands for the roots 
of op(p)/. Since we only required that op(p)/ is continuous, op{p)f can be highly oscillating. 
In this framework, it is therefore impossible to obtain strong confidence statements in the 
sense that we find intervals on which op(p)/ is always positive. By adding bias controlling 
smoothness assumptions such as for instance Holder conditions stronger results can be 
obtained resulting for instance in uniform confidence bands. 

Obtaining multiscale results for error distributions as in Assumption [2] is already a very 
difficult topic on its own and extension to the severely ill-posed case, including Gaussian 
deconvolution, becomes technically challenging since the theory of pseudo-differential oper- 
ators has to the best of our knowledge not been formulated on the induced function spaces 
so far. Therefore we intend to treat this in a subsequent paper. 

Restricting to shape constraint which are associated with pseudo-differential operators ap- 
pears to be a limitation of our method, since important shape constraints as for instance 
curvature cannot be handled within this framework and we may only work with lineariza- 
tions (which is quite common in physics and engineering). Allowing for non-linearity is a 
very challenging task for further investigations. We are further aware of the fact that many 
other important qualitative features are related to integral transforms (that are in general 
not of convolution type) and they do not have a representation as pseudo- differential op- 
erator. For instance complete monotonicity and positive definiteness are by Bernstein's 
and Bochner's Theorem connected to the Laplace transform and Fourier transform, respec- 
tively. They cannot be handled with the methods proposed here and are subject to further 
research. 
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Appendix A 

Throughout the appendix, let 



Mog- 



logs 



log log ^' log log I ■ 

Furthermore, we often use the normalized differential dS, := {2'ir)~^d^ 

Proof of Theorem^ Let us study in a first step the statistic 

„(i) \Tt^h-^Tt^h\ 

T^ > = sup Wh—- ^=^ - Wh- 

{t,h)&B„ Vt^h ^g{t) 

Note that Tn is the same as r„, but ^„ is replaced by g. We will show that there exists a 
(two-sided) Brownian motion W , such that with 



T^ >(W) := sup Wh- ^j= - Wh, 

{i,/i)GB„ VtM V9{t) 



we have 



sup \TW-Ti'\W)\=op{rn). (A.l) 

The main argument is based on the standard version of KMT (cf. [31j). In order to state 
the result, let us define a Brownian bridge on the index set [0, 1] as a centered Gaussian 
process iB{f))y^jrj, T C ^^([0, 1]) with covariance structure 

Cov(i?(/),i?(5)) = (/,<?> -(/,1)(5,1>. 

Let J"o := {3; I—)" I[o,s](^) '■ ■s G [0,1]}- Note that {B{f))^f^jr^y coincides with the classical 
definition of a Brownian bridge. For Ui ~ U[0, 1], i.i.d., the uniform empirical process on 
the function class J-" is defined as 

1 " r 

Un{f) = Vn[-Y.fm- f{x)dx), fGT. 

In particular note that 

Tt,h-^Tt,h = nn{tl^t,hoG-^), 

where G~^ denotes the quantile function of Y. For convenience, we restate the celebrated 
KMT inequality for the uniform empirical process. 
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Theorem 5 (KMT on [0,1], cf. [3T]). There exist versions o/U„ and a Brownian bridge 
B such that for all x 

P( sup |U„(/) - B{f)\ > n~^/2(x + Clogn)) < Ke"^^, 

where C,K,X > are universal constants. 

However, we need a functional version of KMT. We shall prove this by using the theorem 
above in combination with a result due to Koltchinskii [30], (Theorem 11.4, p. 112) stating 
that the supremum over a function class J-" behaves as the supremum over the symmetric 
convex hull sc(J^), defined by 

oo oo 

sc(-F) := {^AJ, : /, G^,Ai G [-1,1],^|A,| < l}. 

i=l 4=1 

Theorem 6. Assume there exists a version B of a Brownian bridge, such that for a sequence 
{Sn)n tending to 0, 

P* ( sup |U„(/) - B{f)\ > 5nix + Clogn)) < Ke"^", 

where C,K,X > are constants depending only on T . Then, there exists a version B of a 
Brownian bridge, such that 

sup |U„(/) - B{f)\ > 5n{x + C'logn)) < i^'e"^'" 

/esc(j-) ^ 

for constants C ,K' ,X' > 0. 

In Theorem [6| P* refers to the outer measure, however, for the function class considered in 
this paper, we have measurability of the corresponding event and hence may replace P* by 
P. It is well-known (cf. Gine et al. [IS], p. 172) that 

{p I p : M ^ M, supp/3 C [0, 1], p(l) = 0, TV(p) < l} C sc(J-o)- (A.2) 

Now, assume that /> : M ^ M is such that TV(p)+3|p(l)| < 1. Define p = (p-p(l)I[[o,i])/(l- 
\p{l)\) and observe that TV(p) < 1 and p(l) = 0. By ( [A^ there exists Ai, A2, . . . G M and 



ti,t2,... G [0,1] such that p = X] -^«^[o,ti] ^i^d X^|Aj| < 1. Therefore, p = (1 — |p(l)|)p + 
p(l)Iro^i] can be written as linear combination of indicator functions, such that the sum of 
the absolute values of weights is bounded by 1 . This shows 

{p I p : M ^ M, suppp C [0, 1], TV(p) + 3|p(l)| < l} C sc(Jo). 
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Since T\{ipt,h ° G ^) < TY{il;t,h) it follows by Assumptions^ (ii) that the function class 

J-„ := {c^V~,^^Vh i^t,h o G-^ : (t, h) GBn, Gg GcCg} 

is a subset of sc(J-o) for sufficiently small constant C*. Combining Theorems p^ and ^ shows 
for 6n = n~^''^ that there are constants G',K',X' and a Brownian bridge (i?(/))jgs^(jrp) 
such that for X > 0, 

Vh\M'^t,hoG-^)-B{i;t^hoG-^)\ _,, X _y^ 

sup C^ ' ^ — ■ ^y ^ — ■ —>n ' {x + C logn)] < K e . 



it,h)eB,„ Geg 
Due to Lemma 



V, 



t,h 



C.3 (i) and In > i^/n for sufficiently large n, we have that wi,^ < Wyin- This 



readily implies with x = log n, 

\Tt^h-^Tt,h\-\B{ilJt,hoG-^) 



sup Wh 

{t,h)eBn, Gee 



Vt,h ^jW) 



Op(l-^l^n-^l^w,in\oi 



n 



Now, let us introduce the (general) Brownian motion W{f) as a centered Gaussian process 
with covariance E[W{f)W{g)] = {f,g). In particular, W{f) = B{f) + {f f)C, ^ ~ M{0, 1) 
and independent of B, defines a Brownian motion and hence there exists a version of 
(W(/))/esc(j-o) such that B{f) = W{f) - {J f)W{l). We have 

JtptA'^) dG{u)\ _^ \\A,hh 

< C sup Wh ^^ 

(t,h)£B„, Geg Vt^h Vg{t) 



sup Wh - 

{t,h)eB„, Geg Vt^h V9{t) 



< sup Whh^^'^ < Wu„u, 

h£[ln,Un] 



1/2 



where the second inequality follows from Assumption [T] (ii) and the last inequality from 



Lemma C.3 (ii). This implies further 

Wh 



E 



Vt,h yW) 



B{i^t,hoG-^)\-\W{iJt,hoG-^)\ 



T^ 



and therefore 



sup 



^(1) \W{i^t,hoG-^)\ ^ 

T^ > - sup Wh — y== Wh 

{t,h)eB„ Vt^h Vdi't) 



Op{ln^''^n ^/'^wiinlogn + wu^ull'^), 



and 



sup 
Geg 



T^i) - Tf )(t^) = Op(/-i/2n-i/2u;i/„logn + Wu^u]; 



/^\ 



.(l)^ 



In the last equality we have used that {W^ )te[o,i\ — (^(I[o,t](")))te[o,i] ^^d 



(m: 
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are (two-sided) standard Brownian motions, proving W{il^t,h ° G~^) = f '^lJt,h{s)^/g{s)dWs 
and hence ( A.l[ ). Further note that Assumption [T] (iii) together with Lemma |B.6 shows 
that 



sup 
Geg 



Ti'Hw) 



sup Wh- f7 '- - Wh 



(t,h)£Bn 



Vt 



t,h 



Op{Kn). 



In a final step let us show that (2.7) is almost surely bounded. In order to establish 



the result, we use Theorem 6.1 and Remark 1 of Diimbgen and Spokoiny |T2]. We set 
p{{t,h),{t',h')) = {\t - t'\ + \h- /i'|)V2. Further, let X{t,h) = VhV~^ f tl;t^h{s)dWs and 
a{t,h) = hy\ 

By assumption, X has continuous sample paths on T and obviously, for all (t, h), (t' , h') G T, 

a\t,h)<a\t',h') + p\it,h),it',h')). 



Let Z ~ AA(0, 1). Since X{t,h) is a Gaussian process and Vt^h > H'/'t./ilb) ^{X{t,h) > 
a{t, h)rj) < F{Z > r]) < exp(— ry^/2), for any rj > 0. Further, denote by 



AtH 



t,t',h,h' 



ipt,hVh ipt'M'Vh 



VtA 



Vt 



t'h' 



(A.3) 



B.5 



Because of F{\X{t, h) - X{t', h')\ > At^t',h,h'v) < 2exp ( - r/^/2) we have by Lemma 
for a universal constant K > 0, 

p(|X(t, h) - X{t\ h')\ > piit, h), it', h'))^) < 2 exp ( - r,^/{2K^)) . 

Finally, we can bound the entropy M{{6u)^'^,{{t,h) ^ T : h < 6}) similarly as in [T^], p. 
145. Therefore, application of Remark 1 in [12] shows that 



S := sup 



ypogf I jAMs)dWs\ ^log(i)log(f) 



(t,h)(^T log (e log I) Vt^h 
is almost surely bounded from above. Define 



log (e log f) 



S' := sup 



ilog^ \Ji;t,,{s)dWs\ Jlog(i)log(^) 



log log ^ Vt^h 



log log ^ 



If e < i^ < e^, then 



loglog)^ = log ^log^^,^ >loglogi.-l + log(elogf) 
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implies 

log (e log f) ^ 1 ^ ^ 

log log ^ ~ log log z^ 

Furthermore, logz^//i < (logz^)(loge//i). Suppose now that S" > (otherwise 5' is bounded 
from below by 0). Then, S' < S and hence S' is almost surely bounded. Finally, 



Therefore, (2.7) holds, i.e. 



^og'iWlogl-Jlog^\<logu. 



\ fiJt,his)dWs\ ^ 

sup Wh- TZ - Wh 

{t,h)eT ^t,h 



is almost surely bounded. 



In the last step, let us prove that supcgg^ J,, |T„— T„ | = Op{supQ^g H^n— (7||oo logn/loglogn). 
For sufficiently large n and because G G Q,gn > c/2 for all t E [0, 1]. Therefore using Lemma 



C^(i), 



It. ^(1)1 ^ |V-E[Tt,fe]|supggg||g„-g|| 

sup \Tn-Ty\< sup Wh—- ^== ^-7- 

Geg (t,h)eB„, Geg Vt^h \/ g{t) 9n{t) 

2supGe£; ||ff" -5|L \Tt.h-^Tt.h]\ 

< — sup Wh- ' 



c 

2supG6g 9n 


-9 oo 


c 
2supGeg 9n 


-9 oo 



(i,/i)GB„, Geg Vt^h \fgit) 

< —\T\ > + sup Wh) 

he[in,u„] 

< —^^^\\-- "ll- (TW + 0(-ig^)). (A.4) 

~ c ^ log log n^ 

Since T„ is a.s. bounded by Theorem [l| the result follows. D 

Remark 2. Next, we give a proof of Theorem^ In fact we proof a slightly stronger version, 
which does not necessarily require the symbol a to he elliptic and Vt^h = ll^t.ftlb- H is only 
assumed that 

(i) Vt,h > \\vt,hh^ 

(a) there exists constants cv,Cv with < cy < h"^^^^^''^Vt^h < Cy < oo 
(Hi) for all {t,h),(t' ,h') G T and whenever h < h' it holds that h^~^^\Vt^h — ^',/i'| ^ 

cvi\t-t'\ + \h-h'\y/\ 
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Note, that as a special case these conditions are satisfied for Vt^h = ||wt,/i.||2 if op(a) is 



elliptic. This follows directly from Lemmas B.2 and B.4 



Proof of Theorem^ In order to prove the statements it is sufficient to check the conditions 
of Theorem [T] For h > 0, define the symbol 



a*^(x, ■■= h'^a^xh + t, h'^^). 



Under the imposed conditions and by Remark B.l we may apply Lemma 



B.3 



(A.5) 



fora(*''^)=<. 



and therefore, uniformly over {t,h) G T and u,u' G M, 



(I) \vt,H{u)\<h- 



min ( 1 



h^ 



' {u-ty 



(II) \vt,hin)-vt,hiu')\<h-'''-'-^u-u'\ and if u,u'^t, 



\vt,h{u) -vt,h{^')\ ~ ^ 



l—m—r 



U — U 



l—m—r I 



In' — t\ In — t\ 



1 



u' {x-ty 



rdxl 



Using (I), we obtain ||vi,ft||oo < h "^ ^ and ||vt,/i||i ^ h^ "^ ^. In order to show that the 
total variation is of the right order, let us decompose vt^h further into v^^ = vt^h\[t-h,t+h] 
and vfl = vt,h - 4h. By (II), TV(t;[;]) < h~^-^ and 



/•OO 1 



Since T\{vt,h) < TV(i;{JJ) + TV(wJ5J) ~ /^""'"^ this shows together with Remark |2| that 
part (ii) of Assumption [I] is satisfied. 

In the next step we verify Assumption 1 (iii) with «.„ = su'pui^\^g^Whh^''^log{l/h) < 
Un log^"n (cf. Lemma C.3, (ii)), i.e. we show 



sup Wh ^ 77 < n„/Mog''/^ n. 



{t,h)eB„, Geg 



Vt 



t,h 



By Lemma 



C.2 



(1) 



we see that this holds for vt^h replaced by v^ ^ . Therefore, it remains to prove 



(2) (2) (2 1) 

the statement for n^^. Let us decompose vlj^ further into vif^ = vt^h\t-i,t+i\r\[t~h,t+hY and 



v_ 



(2,2) _ ^,(2) ^ (2,1) 



t,h 



V_ 



t,h 



VI f^ = vt^h}-[t-i,t+iY- ■'^°^ ^^^ remaining part, let u,u' , such that |n — t| > 



In' — t\>h. We have 



Tv(ng^)(-)[7^-7^](r) < \\vtK'\-)[VW) - ^/m\L 

+ TV(n;y)(-)[A/^-7^]). (A.6) 
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Using (I) and (II) together with the properties of the class G we can bound the variation 
kS'V) [^/^) - ^/m] - vff{u') [v/^K) - v^ I by 



\v_ 



fU 



|m— m'I _|_ u2—m—r \u—u'\ ^ t 1— ?n— -r |n— m'| ^ i 1— m— r 

This yields due 



pu 

/ ^-^Trfa;|. 



: to /i > /„ > 1/n, 



_l_ Ll-m-r ' "" 



</ii-™-Mogi</ii-™-Mogn 



and with (A. 6) also 



log 



n. 



(2 2) 

Finally, let us address the total variation term involving w^^ . Given Gc 
a such that ' '" ' ' ' " ' ' .-. _ _ 



^\T I (2,1) 

ess th 
a > 1/2 and 



(A.7) 



_ volving vl'^j^^\ Given Gc,c,q we can choose 

a + q < 1 (recall that < g < 1/2). By Lemma B.7, we find that 



|4f (n)(u)" - 4f (n')(n')1 < /^^— ^| /J ^^3^ + J^dx 



Moreover 

{uni + 

and thus 



\u 



\v. 



x-t) 
+ \u\y < (1 + |n'| + |n|)'?+° < (3 + 21^ - ^1)"+° < 3 + 21-^ - *!"+' 

< 1 2-m-r I" ''I T^ -*- 



-a 



1 ^ 1^ ~ "" 

\u — tr 



;,y)(n)(n)"||7^-\/^| 
This allows us to bound the variation by 

l^lTi^n^/ain)- V^](nr-4f ^(u')[vW)- x/5(i)](^r| 

< \viX\u){ur\ |7^-7^(^| + ^|^(2,2)(^^^^^._^g2)(^,^^^,^. 

f" 1 1 

y„, (x - 1)2-<?-" ^ r.r - 1\2-» ^ 

and therefore we conclude that 



1 1 

1 1 fir 

_ f)2-q-a ^ (x- t)2-" ^ {x - tf 



{vff{-)[^M)-^M)W) 

/"OO 1 



TV (.If) 



/•oo 



'i+1 {X - ty-l-^ + (x - t)2-° + (x - t)2 "^^ - ^ 
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Together with the bound for v^f^ and (A.7) this shows that Assumption 1 1 (iii) holds. 

FinaUy, Assumption 111 (iv) follows from Lemma B.4 and Remark^ due to cj) ^ }{\'''+^\ n 
^r+m+i/2^ supp(/> C [0,1] and (p G TV(L»r'^+'^V) < oo. This shows that Assumption [l| 
holds for {vt^h,yt,h)- 

In the next step, we verify that (t,/i) i— )■ X{t,h) = yhV^^ f vt^h{s)dWs has continuous 
sample paths. Note that in view of Lemma |B .61 it is sufficient to show that there is an a 
with 1/2 < Q < 1, such that 



whenever (t', h') — )• (t, h) on the space T. Since Assumption IT] (iv) holds, we have 



l^VtM 



'^yt>M\ < ^St + ^uh TT- — ^0' for (t , /i ) ^ (t, h). 



Vt 



t,h 



Vt 



t'M 



By Lemma B.7 T\{vt^h{-){')°') < oo. Therefore, it is sufficient to show that 
TV {{vt,h - vt',h'){-r) -^ 0, whenever {t' , h') -^ {t, h). 



(A. 



Using (B.3), we obtain 



i^t'h <A,h){u) =Vt,h-Vt',h' 



h-^ / A^(f)-F(0p«,)(,^-<^o5,,,,.o5-,i))(.)e-(«-*)/^^s. 



Using Remark B.l , we can apply again Lemma B.3 (here (f) should be replaced by — </> o 



St',h' ° ^t h)- ^^ order to verify (A. 8), we observe that by Lemma B.7 it is enough to show 
\\(j) — (po Sf^h' ° S7, \\rrq ^ for somc q > r + m + 3/2 whenever (t', h') — )• (t, h) in T. Note 
that 

4 



\<j) — (j)0 , 



St',h' o S-^Wlj = 1 ^|(s)25|j-((x^-<^) o St,h){s) - T{{StM-)y{<l^°Se,h'))(.s) 



ds 



j=0 
9 ^ 

0=0 



+ {sf-AT{[{S,^h,{-)y -{St,h{-)y]{<t^oSt>,H')){s) ds (A.9) 



with {St^h{')y '■= ("Tt) • ^ote that for real numbers a,b we have the identity a^ —V = 
"^^1=1 {i)^^~^i^ ~ ^Y- Moreover, we can apply Lemma B.4 for q with m + r + 3/2 < q < 
[r + m + 5/2J (and such a q clearly exists). Thus, with a = St^h{-)i b = St'^h'i') and 
St,h—St',h' = {h/h' — l)St'^h' — {t'—t)/hther.h..s. of (A.9) converges to zero if (t', /i') — )• {t,h). 

D 
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Proof of Theorem\^ By assumption, we can write Pr{x,S) = aR{x^^)\^\^'^i}c^ with a^ G 
5"*i and mi +71 = m! . Recall that pp(x,^) = ap{x)\£,\"^i^. Since ap is real-valued, Op(ap) 



is self-adjoint. Taking the adjoint is a linear operator and therefore arguing as in (3.4) 
yields 

^(op(pr(<Ao 5t,;,))(s) = |sri;^J-(ap(</<o V))(s) + |sr^.;'^^^(Op(a|e)(</<o 5,,;,))(s). 
Decompose vt^h = Vt I + f) ^^ with 






,(2), 



vY^l{u) := j X>^l{s)T{Op{a*j,){^oSt,H)){s)e^'''ds 



using similar arguments as in (B.3) and aj ^(x, ^) := h"^^a*^{xh + t,h ^^). For j = 1, 2, we 
denote by T^^^ and Tn the statistics T^^/i and T^ with vt^h replaced by v^f^, j = 1,2, 
respectively. Recall the definitions of a and r and set 



v[^f,iu) ■.= Aap{t) j \sY+"h-P-^^F{ct>oSt^h){s)e'''^<is 



= Aap{t)DlD^_(P{^). (A.IO) 

Further let V^f^ := \\v[^^\\2 = |Aap(i)|||Z?;+"^0((- - t)//i)||2 = h^/^—"'\Aapit)\\\Dl+'^4^, 
and 

,(1)/ 






r„^'W'-(Ty) := sup z.,1^^ ^V^ ^' -^21og)^|. 



Note that for the approximation of r„ , we can write 

{t,h)eBn \ Vt,h 



T„^'-W= sup «;,(^ ^ ^-^21og)^l. 



Since \TP-T^^'^{W)\ < {T^ -T^'^'^\ + \T^-^'^ -T^'^^'^'^iW)] + \T^'^'^'^{W) -T^''^{W)\ 
it is sufficient to show that there exists a Brownian motion W such that the terms on the 
right hand side converge to zero in probability. This will be done separately, and proofs for 
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the single terms are denoted by (/), {II) and (HI)- From (11) and (HI), we will be able 
to conclude the boundedness of the approximating statistic. 

(I): It is easy to see that for a constant K, \\vlil\\2 < Kh^''^^^ "'" =: V/j^. By Remark 2 and 



\T^ -T^'W\ 



< sup — -p sup Wh ( — , — ^ 

/ie[/„,n„] ^t,ft \{t,h)£B„ ^ y9nit)V^\ 



21og0^)j+ sup ^^V^logd^) 



we can apply Theorem ^ (where m should be replaced by m', of course). Because of 



u™ "^ logn — ;> 0, (/) is proved. 



(II): We show that there is a Brownian motion W such that |r„ — T„ (W^)| ^ 



.P,(l) ^P,(l),oo, 



-^,(1) ^(1)| , \^{^) ^(l).°o 



u''^' - ta^i + \n'' - n'>'^{w)\ + \n'>^^{w) - t:;^''>'^{w)\ = op{i) with 



(l),oo. 



-.P,{l),oo, 



T^ ^ := sup Wfe( ^^ . . 

(t,ft)eB„ V5n(*) ll^t/llb 



2 log 



and 



r«'-(VF):= sup 



,(1) 



u^h 



(t,h)eBn 



fRevl^>{s)dWs 



21og(|) • 



Since by Assumption 4, Op G 5'^ is elliptic and pp G 5"^, we find that \Tn —Tn °°{W)\ 
op(l) and 



j;(i),oo(^)< sup u;;, 
(t,/i)Gr 



,(1) 



fRev^^l>{s)dWs 
V9n{t) Wv^^lh 



2 log 



< oo a.s. 



(A.ll) 



by applying Theorem 2 Moreover, similar as in (A. 4) and using Wh^ 2\og {j^ > 1 



'rt,h \\'"t,h\\A /. , ^fil 

sup|i;'^^' -i^-^l S sup Why! -Ziog^j^j — p l + supZ^^ 

Geg (t,/i)eB„ ^ l^ih ^ Gee 



|r„^'«-T«|< sup u;M/21og 



and 



|4l),°o(^)_^P,(l),oo(^)|< g^p ^,^21og(|^) 

{t,h)eBn ^ 



,(l)l 



,^ K^JMM f 1 + T«.oo(^) 






To finish the proof for (//) , it remains to verify 



II P - (^)|l 
sup u;/,^21og(^) :^ = o(l), 



{t,h)eBn 



K 



(A.12) 



t,/i 
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which will be done below. 



(Ill): By Lemma B.6, we obtain |T„ 



^P,(l),oo 



T 



P,00| 



op(l) if for some a > 1/2, 



Tv(«,-4'i)(r) ,^, 

sup Wh r-p =o(l). 



(A.13) 



Let X be a cut function, i.e. x ^ '5 (the Schwartz space), xi^) = 1 for a; G [—1,1] and 
x{x) = for x G (— oo, —2] U [2, cx)) and define p\ l{x,£,) = h~^x{x){0'P{xh + t) — op(t)) and 
Pt^i^^O = {xh)-\l - x{x)){ap{xh + t)- ap{t)). Then, pS_\\pg G 5° and (ap(-/i + t) - 
ap{t))(j) = hOp{pl 1)4) + hOp{p\ f^){x4>). Define the function 



and note that 



^is(u—t)/h( 



^HQ{- 



Ai-A'j,ni->^\srH<p){s)ds 



(A.14) 



dtAl < h'^'"- I {if'^''^-''^Tms)\'ds < /il+2/30-2. 



2 



with /3q := /3o A (m, + r). Using (B.2), we have now the decomposition 






vlh 



^mfl (1) 



m,0 (2) 



hKHPll + hK-^yt2 + Mt)h--d,, 



(A.15) 



where we have to replace (f) by x(f) in the second term of the right hand side. By assumption 
there exists q > m + r + 3/2 such that G H^. Since the assumptions on p\ ^ and pj ^ of 



Lemma B.3 can be easily verified, we may apply Lemma B.3 to the first two terms on the 



right hand side of (A.15). This yields together with Lemmas B.7, B.8 and B.9, uniformly 
over {t,h) G T, 

TV («. - vlllxr) < TV {{hK^^p^^l + hK'-^p^l + ap(t)/.-™d,,) irh-W]) 

+ TV «,(r%\[t-i,t+i]) + TV (t;S(r%\[t-i,t+i]) 



<h 



1— m— r 



• + /i^o^ 



+ h 



l—r—m 



Since m + r > 1/2, this shows (A.13). From the decomposition (A.15) we obtain further 



-"tA 



■'t,h\ 



2 < /i3/2-'n-r- _^ ^i/2+/3* -m-r ^^j^^j ^j^jg gj^^^g (IaI^. Thus the first part of the 



theorem is proved. 



Finally with Lemma B.6 it is easy to check that ( A.ll ) implies that (3.13) is bounded since 



(A. 12) and (A.13) also hold with Bn and o(l) replaced by T and 0(1), respectively. D 
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Appendix B Lemmas for the proof of the main theorems 

We have the fohowing uniform and continuous embedding of Sobolev spaces. 

Lemma B.l. Let {pi)i^j C S"*" be a symbol class of pseudo-differential operators. Suppose 
further that for a £ {0, 1}, /c G N, and finite constants Ck, only depending on k, 

sup\d^d^p,ix,o\ < Cfc(i + leir, vx,e G M. 

Then, for any s G M, there exists a finite constant C = C(s,m, max;j<i^2|s|+2|m| Ck), such 
that for all 4> G H'^, 

\\op{Pi)nH^-n.<c\mH^. 

Proof. This proof requires some subtle technicahties, appearing in the theory of pseudo- 
differential operators. First note that for any symbol a S S"^ there exists a universal constant 
Ci (which is in particular independent of a), such that 

II Op(a)u||2 < Ci max ll^f 9?o(x,^)|L^.„2\ll^i||2, for all u G L^ (B.l) 

a,/3e{o,i} ^ -^ ^'^ > 

(cf. Theorem 2 in Hwang |26|). For r G M denote by Op((0'") the pseudo-differential 

operator with symbol (x,^) i— ;■ {S,Y ■ It is well-known that this operator is indeed in S'^ . 

Throughout the remaining proof let C = C(s, m, max;j<i_(_2|s|+2|m| Ck), denote a finite but 

unspecified constant, which may even change from line to line. Note that it is sufficient to 

show that uniformly in ^ S L^, 

II Op((0^-"^) o Ov{pi) o Op((0-^)^||2 < cuh 

(set (j) = {D)~'^ip). The composition of two operators with symbols in S'^^ and S"™^, 
respectively is again a pseudo-differential operator and its symbol is in ^''"i+^a^ Therefore, 
Op((0'~'") ° OviPt) o Op((0~') G S°. Set po.i for its symbol. With ([b1]) the lemma is 
proved, once we have established that 

sup max ||9fa?po,i(a;,0|Loom2-, < C < oo. 

It is not difficult to see that Op(pi) o Op((0"'') = Op(pi(0-'*). By Theorem 4.1 in |2], 
Po,i = (0*~™'#(k(0~'^)i where # denotes the Leibniz product, i.e. for p^^> G S'^'^ and 
p(2) g 5™"^, p^^'^p^'^' can be written as an oscillatory integral (cf. pll39j) 



(p(i)#p(2))(a;,^) :=0s- / / e-'y'^p^^\x,C + v)P^^Hx + y,OdydrJ 

:= lim / / x(ey) e^)e~*^V '(x,S,+r])p^ '(x + y,^)dy3f], 
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for any x iii the Schwartz space of rapidly decreasing functions on M? with x(0,0) = 1. 
Further for a € S"^ and arbitrary I £N, 21 > 1 + m, 



Os 



,-wv 



a{y,ri)dya:r] 



-^y^{y)-\l - 52)[(^)-2'(l - d'yya{y,v)]dy(tr, 



and the integrand on the r.h.s. is in L^ (cf. [39j, p. 235). This can be also used to show 
that differentiation and integration commute for oscillatory integrals, 

a^9^0s- / f e-'y''a{x,y,^,rj)dyS'n = Os- I j e-'y'^d'^d^^a{x,y,i,r])dycl;ri. 

Using Peetre's inequality, i.e. (^ + r/)* < 2l*l(^)l''l(?7)*, we see that for a,/3 G {0,1} and 
{x,^) fixed, the function (y, ry) i— )• dxd?{^ + r})^~^pi{x + y,i){£)^^ defines a symbol in 
S^-'^. Hence, for ^ G N, 1< 2£ - |s - m| < 2, a, /3 G {0, 1}, 

^'y^{y)-\l-dl)[{^)-^\l-dlYdP,d1{i + ^Y-^p,{x + y,im-Vy3:r^- 



Using the imposed uniform bound on d^dfp{x,^), we obtain by treating the cases a = 
and a = 1 separately. 



sup\d!id^Po^i{x,()\ 

i 



using Peetre's inequality again and 2i > 1 + |s — m| for the second estimate. Since for 
g G M, (0« G 5«, it follows that |9^(01 < (0^"" and since (.) > 1, 

2 
fc=0 

Similar for the second term. Application of Peetre's inequality as above completes the 
proof. D 



Lemma B.2. Work under the assumptions of Theorem^ Ifvt^h is given as in (3.6), then, 

IL, II > 1,1/2— m—r 

Proof. We only discuss the case 7 > 0. If 7 = the proof can be done similarly. It follows 
from the definition that 

l + IsP^ 



\vt,h\\2 



\Hfe)i-s) 



\T{Op{a*){^oSt,h)){s)\"ds 



T{Op{a^){^oSt,H)) 



Hh){- 
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Since the adjoint is given by a*{x, ^) = e^^a(x, ^) in the sense of asymptotic summation, it 
follows immediately that a*{x, ^) = 0(2;, ^) + r[x, ^) with r G 5'™--i_ From this we conclude 
that Op(a*) is an elliptic pseudo-differential operator. Because of a* £ 5™ and ellipticity 
there exists a so called left parametrix (a*)~^ G S~"^ such that Op((a*)~^) Op(a*) = 
l + Op(a') and a' G 3'°°, where 5"°° = f]^ 5*" (cf. Theorem 18.1.9 in Hormander [SSj). In 
particular, a' G S~^. Moreover, Op((a*)~^) : //'"+T — )■ f{^+"^ ig a continuous and linear and 
therefore bounded operator (cf. Lemma B.l). Furthermore, by convexity, l + jsp''' > 2{s)'^^ 



and there exists a finite constant c > such that 
^^^'^'' \T{Op{an{<PoSt,h)){s)\'ds 



\nfe){-s) 

>2Cf\\0p{a'){^oSt,h)fHr+-y > ||Op((a'^)-i)Op(a*)(0o5t,/,)|||.+™ 

= ll(l + 0p(a'))((/'o5i,/j)||^,+„, > (||(/)o5j,/j||j:^r+m - \\ Op{a){4> o St,h)\\ m+^) 

> (110 O'S't./l II //'■+'" -C||0o5t,ft||j:^r+m-l) 

>h f {1 + \j^\Y^''\T{<p){s)fds + 0(/i2(i— )) 



On the other hand, we see immediately that 
T{Op{a*){<PoSt,h)) 2 



< II Op(a^)(0o V)||i„ < 1100^11^,,+^ < /ii-2(^'+-). 



Hm-) 

Since G L^ and h tends to zero the claim follows. D 

Note that for bounded intervals [a,b], partial integration holds f f'g = fg]'^ — J fg' 
whenever / and g are absolute continuous on [a,b]. As a direct consequence, we have 
/r f'g = - Im fg' if /' and g' exist and fg, f'g, fg' G L^. 

In order to formulate the key estimate for proving Theorems [2] and |3j let us introduce for 
fixed (f>, a generic symbol a*-*''*) G 5"^, and A = A^ as in (|3.7[) 



(Kl^^a^t,h)^{u) = h-"" /"A(^)^(Op(o(*''^))(/))(s)e^^(«-*)/^crs. (B.2) 

From the context it will be always clear to which (/) the operator K'^'^ay'^' refers to. To 
simplify the expressions we do not indicate the dependence on (/> and /^ explicitly. 

Remark B.l. Recall ( |A.5[ ) and note that if a £ S"^ then also a^^ G 5™. Due to 
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we obtain for vt^h in (3.6) the representation, 

vtMn) = h-"^ /A^(f)^(Op«,)0)(.)e-("-*)/'^^. = (i^,X<,)(n). (B.3) 



Lemma B.3. For a*^*'^) G S"^ and -y + m = m let K^'^a^^'^') be as defined in (B.2). Work 
under Assumption^ and suppose that 

(i) (/)€ Hi with q> m + r + 3/2, 
(a) 7 G {0} U [l,oo), and 

(Hi) for k gN, a £ {0, 1, . . . , 5}, there exist finite constants Ck, such that 
sup \d^d?a'-^^''Hx,C)\ < Ckil + 1^1)", for all x,^ G M. 



{t,h)er 



Then, there exists a constant C = C{q,r,y,m,Ci,Cu,^oaaxk<4:qCk) (Ci and Cu as in As- 
suniption[^ such that for {t,h) G T, 



(^) \iKlC^^'''^)iu)\<CmH^h 



, '"~^-in(l,(^), 



(tt) |(K2!ra^*'''^)(«) - (^iTa^*'^^ )("')! < '^II^IIhI^""'""'^!^ - ^'1 and for u,u' / t, 
|(i^,Xa(*''^))(n) - (KXa(*''^))(^')l < CMhI ,,_JI~_V 

rdxl. 



H^ 



1— m— r I 



x-t] 



Proof. During this proof, C = C{q, r, 7, m, Ci, Cu, maxfc<4q Ck) denotes an unspecified con- 
stant wliich may change in every line. The proof rehes essentially on the well-known commu- 
tator relation for pseudo-differential operators [x,Op(j))] = iOp{d^p), with d^^p : (x,^) 1— )• 
d^p{x,S,) (cf. Theorem 18.1.6 in [25]). By induction for A; G N, 

x'=Op(a(*'^)) = ^ (^)i'Op{dla^'^''^)x''-'. (B.4) 

r=0 ^''^ 

As a preliminary result, let us show that for A; = 0, 1, 2 the L^-norms of 

(.) D', A(f)j-(Op(a(*''^))0)(.), 
are bounded by C\\4>\\jjih~'^~'^. Using Assumption ^ and Lemma 



B.l 



(B.5) 
this follows immedi- 



ately for A; = and q > r + m + 3/2 by 

is) A(f)^(Op(a(*''^))(/<)(s) ds < C^'h-'-'\\{Y+^+^' J-(Op(a(*'^))</.)||^ 



< Ch- 



-r—j\ 



Op(a 



{t,h)^ 



< Ch^^'^^\ 



\Hi- 



(B.6) 
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Now, a(*''') G 5*™ implies that for A; G N, a.^o^*''') G 5"""'= C 5"". Since by (IrI), Assumptions 



(i) and (iii), and Lemma B.l 



\\{xf Op(a(*-'^))<^||i < 11(1 + x') Op(a(*''^))0||2 < C||<A||hj. < oo, (B.7) 

we obtain for j G {1, 2}, 

Z)^-F(Op(a(*'^))0) = (-i)-'-F(x-'Op(a(*''^))0)(s) 
by interchanging differentiation and integration. Exphcit calculations thus show 

Z),A(f)^(Op(a(*''^))</.)(5) = (Z),A(f))^(Op(a(*'^))</.)(s)-iA(f)^(xOp(a(*''^))0)(s) 
and 

D2A(A)^(Op(a(*-''))0)(.) = (Z?2^(^))^(0p(a(*''^))<A)(.) -2i(Z?.A(^))^(xOp(a(*''^))0)(.) 

-A(f).F(x2 0p(a(*''^))c/>)(.). 

(B.8) 



To finish the proof of (B.5) let us distinguish two cases, namely (!) 7 G {0} U [2, 00) and 

{II) 7 e {1,2). 

(I): For k = 0,1,2 and s / 0, we see by elementary calculations, | (s)DJA(^) | < Ch^^'^''{s)^'^'^^^. 
Using (B.4) and arguing similar as for (B.6) we obtain (replacing (p by x(j) or x'^cj) if neces- 
sary) bounds of the L^-norms which are of the correct order ||i;^||j:^>j/i~^~'^. 

(II): In principal we use the same arguments as in (/) but a singularity appears by expanding 



the first term on the r.h.s. of (|B.8|). In fact, it is sufficient to show that 



1 j^2\ s\■y,-^J. 



s\h\ 



^i,U .,-^(Op(o(*''^))</>)(s)d.<Q/i-^-^||^(Op(o(*''^))<A)|L / i^r^ds 

< Cih-^-^ Op(a(*''^))(A||i < Ch-^-^UWH^, 



where the last inequality follows from (B.7). Since this has the right order h ''' '''||</'||//9, 



(B.5) follows for 7 G (1,2). 



Together (/) and (//) prove (B.5). Hence we can apply partial integration twice and obtain 
for t ^ u. 



(i^,X«^*''^)(«) 



Mn-t)/h 1^2^(1) j-(Op(a(*''^))0)(s)5s (B.9) 



(u-t)2 

and similarly, first interchanging integration and differentiation, 

Du{K2^^a^'^^^){u) = ih-^-^ /"e*^("-*)/'^sA(^).F(Op(a(*'''))0)(s)crs 



{u - ty^ 



e'<"^-t)/hD^^sX{f^)j^{0p{a'^''^^)4>){s)ds (B.IO) 
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(i): The estimates |(i^2!ra^*'''^)(")l ^ C'll<^ll/f|^""'"'' and\{Kl';^a^'^''^){u)\ < CUWh^^^-"'-'^ /{u- 
t)^ follow directly from (B.6) as well as (B.9) together with the L^ bound of (B.5) for k = 2. 



(it): To prove \{Kl';^a^'''''))iu) - (i^^^ra^*'^^ )("')! < '^ll0llH|/i~'""''"V " ^'1 it is enough 
to note that |e*^ — e*^| < |x — y|. The result then follows from (B.6) again. For the second 



bound, see (B.IO). The estimate for the L^-norm of (B.5) with k = 2 completes the 
proof. D 



Let [x] be the smallest integer which is not smaller than x. 

Lemma B.4. Let < i < 1/2 and q > 0. Assume that 4> £ iJ^?! n H'^+\ supp(j) C [0, 1] 
and TVp^gl^) < oo. Then, for h < h' , 



UoSt,h - o St',h'\\H^ < h-'l^\t - t'\^^ + \h' - h\. 
In particular, for cp G i/r^+ml p ^r+m+i/2^ suppc/) C [0, 1] and TV{D^''+'^'^(j)) <oo, h< h' , 

\\vt,h - vt',k'h < h-'-^y'lt-t'l + lh'-hl. 



Proof. Since 



S-, 



t,h 



s, 



t'A' 



\Hi 



< 



(.)2^|l-e-(-*')n^(0(,))(.)|^rf. + ||0(,)-^(^)|| 



2 
HI 



and ll-e*'*^*-*')! < 2min(|s||t- 1'|, 1) < 2min(|s|^|i-t'|^ 1) < 2|s|^|t-t'|^ we obtain (note 
that (j) £ i?'?+^) 



st,H-^os,,,,\\l,<\t-trh'-"^-''+Mj,)-cp{j,) 



|2 
\hi- 



Set k=\q]. Then, 



;.)-^(F)iiH..™^^^-^ik-^(^-)iii 



</ii-2<?|U_A//. .M|2^,.i-2«||n'=, 



9 



V )j\\2- 



For jE{0,/c}, 

\\DUd 



^))||^<2||0O-)-0O-)(A.)||^ + 2(l-(A)^y||^O-)(A.)|l 



Now, application of Lemma C.5 completes the proof for the first part. The second claim 
follows from 

\\vt,h-vt',h'\\2= / \Hs)\'^\^{Op{a*'){(t>o St,h-<po St'ji'))){s)\ ds 



U 



II I I z 

< \\(t>° St,h -4)0 St'^h'WlJr 
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Lemma B.5. Let Affi^h^h' be defined as in (A.3) and work under Assumptionui Then, fi 
a global constant K > 0, 

At,t',h,h' < K^\t-t'\ + \h-h'\. 
Proof. Without loss of generality, assume that for fixed (t, h), Vt^h ^ ^t',/i'- We can write 



1 



or 



. . \m,h\'h-tpt',h'Vh'\\2 , /77 
At,t',h,h' < 77 ^ y h' \\ipt' ,h' \\2 



< 



Vt,h 
'H,h\fh-tpt',h'Vh!\\2 



Vt 



+ Vh 



Vt,h Vf^h' 

-,\yt,h — yt',h'\ 



t,h 



Vt 



t,h 



By triangle inequality, HV't.hV^ - V'i'./i'V/i'lb < \/^||V't,/i - V'i'./i'lb + |V^ - VhJ 
Thus, 



^i,/i 2- 



,h -#,/!' lb + |Vi,h -yt',h>\] + V\h-h 



Vh/ 

At,t',h,h' < T^i ir</^t 

If h' < h, then the result follows by Assumption [l] (iv) and some elementary computations. 
Otherwise we can estimate vh/ < -y/|/i — h'\ + vh and so 

At,t',h,h' < T^i\\'ipt,h -ipt',h'h + \'Vt,h - Vt'^h'l) +5's/\h- h'\. 
vt,h ^ ' 

U 



The next lemma extends a well-known bound for functions with compact support to general 
cadlag functions. We found this result useful for estimating the supremum over a Gaussian 
process if entropy bounds are difficult. 

Lemma B.6. Let (VFi)ig]R denote a two-sided Brownian motion. For a > 1/2, a family of 
real-valued cddldg functions {fi\ i £ L}, and a constant Ca, we have 



sup I / fi{s)dWs\ <Ca sup |W^,|supTV(/i), 
iei J s6[o,i] iei 

where W is a standard Brownian motion on the same probability space and fi{s) = /j(s)(s)". 

Proof. The proof consists of two steps. First suppose that IJie/ supp fi C [0, 1] and as- 
sume that the fi are of bounded variation. Then, for all i £ I, there exists a func- 
tion Qi with ||gi||oo < TV(/i) and a probability measure Pi with Pj[Oil[= 1) such that 



50 



fi{u) = J,Q_^-.qi{u)Pi{du) for all u G M, because /» is cadlag and thus /«(!) = 0. With 
probability one, 

sup I [ fiis)dWs\ = sup [ Wsqiis)Pi{ds) < sup \Ws\ supTV(/i). 

i&i J iei J se[o,i] «e/ 



Now let us consider the general case. If Ca '■= ||(-) "II2 then h{s) = C^'^{s) ^° is a density 
of a random variable. Let H be the corresponding distribution function. Note that 



(Wt) 



te[o,i] 



„ ^/Hif^Wm„-,,„^_^^^^^^ 



is a standard Brownian motion satisfying dWfj(^g^ = ^,/h(s)dWs and thus 



sup 



[ fi{s)dWs\=Ca sup \ [ n{s)dWHis)\=CaSnp\ [ MH-\s))dWs\ 
J iei J iei Jo 



Since TV(/j o H ^) = TV(/i) the result follows from the first part. 



D 



Remark 3. For the proofs of the subsequent lemmas, we make often use of elementary 
facts related to the function (•)" E S'^ with < a < 1. Note that for t G [0, 1], Du{u)" < 
a{u)''-^ G S'^-\ Du{u)'' < a, 



1 



(u)° < -(l + |n|") < l + l-u-tp, and (u)''- ' < 2\u - t 



la-l 



(B.ll) 



where the last inequality follows from \u — t\ "(u)" < \u\ "(n)" + 1 < 2. 

Lemma B.7. For (t, h) £ T let rt^h be a function satisfying the conclusions of Lemma 
B. 5| for r,m and (p. Assume 1/2 < a < 1. Then, there exists a constant K independent of 
(t, h) £ T and (p such that 



\rtMu){ur-rtA^'){ur\<KmH.h 



1—m—r 



+ 



,, (x-t)2— (x-t) 



zdx 



for all u,u' ^ t and 






1—m—r 



Proof. Let C be as in Lemma B.3 In this proof K = K{a, C) denotes a generic constant 
which may change from line to line. Without loss of generality, we may assume that 
|u — 1| > |u' — 1| . Furthermore, the bound is trivial \iu' <t <u oi u <t <u' . Therefore, let 
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us assume further that u > u' > t (the case u < u' < t can be treated similarly). Together 
with the conclusions from Lemma IB. 31 and Remark [S] this shows that 

\rtMu){nr-rt,h{^^'){uT\ < \rtM^)\ |(tx)° - {uT\ + {nThh{^) - rt,h{u')\ 

1 Ui' _ f\^ _i_ 1 

2—m—r ' 1— m— r I"- ^1 n^ J- 



< K 



m 



h' 



+ h 



\u — u \. 



[u-ty ' '" \u' -t\ \u-t\ 

Clearly, the second term in the bracket dominates uniformly over h G (0,1]. By Taylor 
expansion 



\u — u 



u — u 



\u' - t|i-" \u -t\ {u- tY{u' - t)i-"(n - t)i-" 

(n-t)i-"-('u'-t)i-" 



< 



(l-a)(u'-t)i-°(n-t)i- 



u' (x-ty- 



-dx. 



Hence 



\u' — t\ \u — t\ 



u — u 



u' {x-ty 



rfix 



completes the proof for the first part. For the second part decompose ?'t,/il[i-i,t+i] in 
r^l = rt^h\[t-h,t+h] and r^^ = ri^/jl[(_i j+ij — r^^. Observe that the conclusions of Lemma 



B.3 imply 

TV(rg(r) < |KrVM+/.]llooTV(r«)+TV((rVM+/.])lkSlloo <^II<^IIhi/^-'"- 
By using the first part of the lemma, we conclude that uniformly in (t, h) E T, 

TV {rt,H{rh-i,t+i]) < TV {r^^l{-r) + TV {rfl{.r) < KUlMh'^^-^- + /i— ) 



andalsoTV(ri,?,(-)"I 



-i,t+i]. 



< K 



Hi' 



A—m—r 
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Lemma B.8. Work under Assumptions^ and^ and suppose that m + r > 1/2, (x)(/) G 

L'^, and (p G fjm+r+ ^ ^^^ ^^^^ ^^ ^^ defined in (|A.14). Then, there exists a constant K 



independent of (t, h) £ T, such that for 1/2 < a < 1, 

TV(di,;,(ri[t-i,i+i]) < i^/i''°^("^+^)^Mog(i). 



Proof. For convenience let /3q := /3o A {m + r) and substitute s i— )• — s in (A. 14), i.e. 



dt,hiu) :-- 



Define 



-is(u—t)/h 



Fh{s) :-- 



HfeM] 



1 



AtPMnL^\srnct>)i-s)(fs. 
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By Assumptions [2] and [sj we can bound the L^-norm of 

s^{s)Ft,is)i'^\sr:Fi<P){-s) 



(B.12) 



uniformly in {t,h) by /(s)(f )^-^o|s|'"|-F((/))(-s)|(is. Bounding (f)'-'^« by {j^Y'^o and con- 
sidering the cases r < /3q and r > /3q separately, we find h^o~^ J^s)i+^+"^"^o | J^(0)(— s)|(is < 
/i^o-^||0||j^r+m+i as an upper bound for (B.12), uniformly in {t,h) € T. Furthermore, 



DsFhis) 
and by Assumptions [2] and |3| 






AruP-^h-^^r ^ 



sDsFhis) < |sZ),^(/e)(f ) 



42 2p|s 
^ '-s Ift 



l2r 



{^imi))' 



A{r^)-hP+'h\f^\'■^'D,Ti^){^) -1 



'--' WhWh 



r-1 



+ 



-/3o 



/il \h 



<2(f) 



r-/3* 



Similar as above, we can conclude that the L^-norm of 

s^DssFh{s)i':\srJ^m-s) 



is bounded by const. x/i^" '"||(/)||^r+m+i, uniformly over all {t,h) G T. Therefore, we have 
by interchanging differentiation and integration first and partial integration, 



Dudt,h{u) 



h 



se 



u — t 



-''^''-'^/^Fh{s)i'^\srT{^){-s)ds 



and the second equality holds for u ^ t. Together with (B.12) this shows that \dt^h{u)\ ^ 
h"o~^ and \Dudt^h{u)\ ^ /i^o~''~^ min(l, /i/|u — t\). Using Remark K3^ we find for the sets 

a[^1 ■.= [t-h,t + h] and A^^l := [t - 1, t + 1] \ a[^1, 

T^yidtAt-i,t+i]) < 2\\dt,h\\oo + [ ,^ \D^dt,hiu)\du+ 1'^^ \D^dt,hiu)\du < /i^o-^og (^). 

Thus, TY{dtM-n[t-i,t+i]) < Ni.ftlloo +TV(dt,a[t_i,t+i]) < h^o-riog (1). D 

Lemma B.9. Work under the assumptions of Theorem 5^ and let vf^ be defined as in 
dAlol). Then, for 1/2 < a < 1, 



TV«,(riR\[i-i,m])<^^'-'-"^, 
where the constant K does not depend on {t,h). 
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Proof. The proof uses essentially the same arguments as the proof of Lemma B.3 Let 
q := [r + m + 5/2j and recall that by assumption (x)^0 € L^. Decomposing the L^ 
norm on M into L-'^([— 1, 1]) and L^(M \ [—1,1]) and using Cauchy-Schwarz inequality and 
\\J^{(p)\\oo < II0II1, we see that for j £ {0,1}, the L^-norm ol s ^ Di\s\''+"'L7''~^J'{(l))is) 
is bounded by const. x(||(;/)||j|^9 + ||(/'||i). Similar, for k G {0,1,2}, the L^-norms of s 1— >• 
2;)fc|^|r+m+i^-p-M+ jr^^^^j^g^ g^j.g bounded by a multiple of ||(/>||//'j + \\4>\\i- Hence, we have 

and 

Together with Remark |3] this shows that 

/•oo 

TV «,(rw,oo)) < ii</.(ri[m,oo)iioo + / \D^vi„{u){-r\du 



t+i 

r"00 l1— r— m ul—r—m 



t+1 



M-tM-" U-t 



Similar, we can bound the total variation on (—00, t — 1]. D 

Appendix C Further technicalities 

Lemma C.l. Assume that Kn — )• 00, V't,/i = V'(^) o.'^dVt^h = \\i^t,h\\2 = V^UV'lb- Suppose 



that limj_!.oo log(j)| J ipis — j)Tp{s)ds\ — )• 0. Then, with Wh and B°^ as defined in (2.4) and 



(2.8), respectively, 



(t,h)eB^„ V ""^^'^ll^ V '^y 4 

Proof. Write K := A'„ and let ^j := \\tpt^h\\2'^ f i;j/K,i/K{s)dWs for j = 0,...,K - 1. 
Now, {^j)j is a stationary sequence of centered and standardized normal random variables. 
In particular the distribution of {S,j)j does not depend on K and the covariance decays by 
assumption at a faster rate than logarithmically. By Theorem 4.3.3 (ii) in |34j the maximum 
behaves as the maximum of K independent standard normal r.v., i.e. 

P( max(^i, . . . , ^k) < o-K + bxt) — ?• exp ( — e~*) , for t G R and K — )• 00, 

where 

, 1 , /TT-. ^ loglogi-f + log(47r) 
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Using the tail-equivalence criterion (cf. [2], Proposition 3.3.28), we obtain further 
lim P(max(|^i|,...,|^i^|) < ai^ + 6i<-(t + log2)) =exp(-e"*), for t G M, 



iC— >oo 



Note that T° := supu^i^\^^o Wh{\\il^t,h\\2 I J '4't,h{s)dWs\ — ■\/2\og{v /h)) has the same distri- 
bution as w^-i max(|^i|, . . . , |^k|) — Wi<^~i yj2\og{vK). It is easy to show that 



v/log uK = Vlog K + -^JP^ + 



O 



2^loiK Viog3/2^ 

and 

1 log log K ^ ^ / log log K \ 

Assume that r/„ — )• and rjn loglogK — )• oo. Then for sufficiently large n, 

IP(?^« > -3 + ^n) = P( max(|6|, . ■ • , l^i^l) > ( - 3 + r]n)/wK-^ + \/2b^^ 

max(|^i|,...,|^ii-|) > 



^ VS logic " V2 log K Mog^/^A" 



< P(^max(|a|, . . . , \iK\) >ciK + bK2r,nloglogKj -^ 0. 
Similarly, 

P(r° < -1 - r?„) < P(max(|ei|, . . . , \Ck\) < ax - 6i^7?„ log log iC ) ^ 0. 
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Lemma C.2. Condition (Hi) in Assum,ption\n is fulfilled with Kn = Wu„Un , whenever 
Condition (ii) of Assumption\n holds and for all (t, h) € Bn, supp ipt,h C [t — h,t + h]. 

Proof Let 1/2 < a < 1. Then (•)" : M ^ M is Lipschitz. Recah that TY{fg) < 
||/||ooTV(5) + ||g'||ooTV(/). Since \J(t,h)eB„^^PP'^t,h C [-1,2] is bounded and contains 
the support of all functions s i— )• ipt.his) [\/g{s) — -y/g(t)] (s)" (indexed in (t, h) E Bn), we 
obtain uniformly over (t, /i) G i?„ and G £ G, 

\^t,hi-) [vW) - vW)] IL + TV (V^t,h(-) [vW) - vW)]) 



< 
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Furthermore, 

Tv(vt,h(•)[^^^)-7^]) 

where the last inequahty follows from Assmnption [I] (ii) as well as the properties of Q. With 



Lemma C.3 (ii) the result follows. D 



In the next lemma, we collect two facts about Wh- 

Lemma C.3. For h G (0, 1] and u > e let Wh '■= \/'^^^ log(^/^)/ log log(z^//i) . Then 

(i) h^^ Wfi is strictly decreasing on (O, z^exp(e~^)] , and 
(ii) h I— ;■ Whh^'^ is strictly increasing on (0, 1]. 

Proof. With X = x{h) := \og\og{v/h) > 0, we have \ogWh = — log(2)/2 + x/2 — logx. 
Since the derivative of this w.r.t. x equals 1/2 — 1/x and is strictly positive for x > 2, we 
conclude that \ogWh is strictly increasing in x{h) > 2, i.e. in h < z^exp(e~^). Moreover, 
log{whh^''^) = \og{u /2) /2 + X /2 — \ogx — e^ /2, and the derivative of this w.r.t. x > equals 
1/2 — 1/x — e^/2 < 0. Thus Whh^'^ is strictly increasing in /i G (0, 1]. D 

Lemma C.4. Suppose that supp/ C [0,oo) and letO<a<l. Then, 

[ ''\f{x)-f{x-a)\dx<aTY{f) 
Jo 



and 

-1 



\fiax)-fix)\dx<il-a)TYif) 







Proof. Without loss of generality, we can assume that / is of bounded variation, i.e.TV(/) < 
oo. Hence, there exist two positive and monotone increasing functions /i,/2, such that 
/ = /i - /2, fi{u) = f2{u) = OfoTU<0, and /i(oo) + ^(oo) = TV(/). Set g = fi + /a- 
Then g is positive and monotone as well, and 

fl+a fl+a fl+a 

/ \f{x)-f{x-a)\dx< / [g(x + a) - gix))dx < / g{x)dx < aTY{f). 
Jo Jo Jl 
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In order to derive the second inequality, note that 

\f{ax) — f{x)\dx < / [g{x) — g{ax))dx = / g{x)dx + [1 — 1/a) / g{x)dx 



< [ g{x)dx<{l-a)TY{f). 

J a 



D 



Lemma C.5. Suppose that suppV' C [0,1] and TV(V') < oo. Let {t,h) G T- Then, there 
exists a constant K only depending on ip, such that 



H^)-H-if) <Ky^\h-hU\t-t'\ 



Proof. Note that 



, 2 

< 2 



2 
L2 



< 2 






ds 



ds + 2 



^{'w)-^{'ir) 



ds. 



Without loss of generality assume h' < h. Using Lemma C.4 yields 



t+h 



H'ir)-H¥) 



ds< 



t+h' 



{h-h')+ IV'l^)-^!^) 



ds 



{h-h') + h' \^{^u) -^{u)\du 



Similarly, assuming t < t' , 



+ TV (V) 



\h-h'\. 



{t'-t)/h'+i 
ds = h' I \ij{u)-^{u-^-^)\du<\t' -t\TY{'iP). 
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